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Abstract 


A  procedure  was  devised  whereby  an  ordinal  similarity  scale  of 
all  possible  pairs  of  stimuli  in  a  set  was  derived  for  individual  Ss  from 
Ss  1  rating  of  a  portion  of  the  set  of  all  possible  pairs  of  stimulus  pairs. 
This  was  done  by  supplementing  the  information  from  jSs 1  ratings  with  in¬ 
formation  provided  by  assumptions. 

64  _Ss  rated  fcr  similarity  two  samples  of  verbal  stimuli,  each 
sample  consisting  of  material  of  both  high  and  low  meaningfulness.  _Ss 
also  practiced  learning  verbal  responses  to  these  stimuli  for  either  2, 

4,  8,  or  16  trials  in  a  paired-associate  learning  (PAL)  task.  The  simi¬ 
larity  rating  of  the  material  was  done  both  before  and  after  the  PAL 
procedure. 


The  reliability  of  the  procedure,  as  assessed  by  a  measure  of 
internal  consistency  of  response  within  a  single  rating  session  and  by 
test-retest  reliability  of  repeated  ratings,  was  moderately  high.  Internal 
consistency  of  ratings  was  reliably  affected  by  differences  in  Ss,  specific 
samples  of  stimuli,  and  previous  rating  of  stimuli  at  a  different  level  of 
meaningfulness,  but  not  by  the  meaningfulness  of  the  stimuli  or  previous 
use  of  stimuli  in  a  PAL  task.  Test-retest  reliability  w as  affected  by  the 
meaningful n e.s s  of  the  stimuli,  but  not  b>  use  of  the  stimuli  in  PAL  between 
ratings.  S  showed  highly  stereotyped  standards  of  similarity  for  low- 
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meaningful  stimuli,  but  more  idiosyncratic  standards  in  rating  high- 
meaningful  stimuli.  Rated  similarity  was  reliably  related  to  confusion 
errors  after  4  trials,  but  not  after  2,  8,  or  16  trials  of  PAL,  and  to 
overt  recall  errors  over  the  first  7  trials  of  PAL.  Similarity  rating 
of  stimuli  before  PAL  suppressed  correct  recall  of  responses  to  the 
stimuli  during  the  early  stage  of  practice. 
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Chapter  I 


Introduction 

An  important  concept  in  contemporary  verbal  learning  theory  is 
that  of  similarity.  Speculation  as  to  the  nature  of  similarity  and  its 
influence  on  human  learning  is  at  least  as  old  as  the  British  Avssocia- 
tionists,  and  an  attempt  was  made  by  Robinson  as  early  as  1920  to  use 
similarity  as  a  construct  in  what  would  now  be  considered  modern  verbal 
learning  theory  and  experimentation.  Robinson’s  work  was  concerned 
with  similarity  of  verbal  material  as  a  predictor  of  retroactive  in¬ 
hibition,  still  a  topic  of  interest  today.  Later  landmarks  in  the 
development  of  the  concept  of  similarity  in  verbal  learning  were  Gibson’s 
(1940)  use  of  stimulus  generalization  principles  derived  from  condition¬ 
ing  experiments  to  explain  phenomena  such  as  intra-list  interference  in 
acquisition,  Osgood's  (1949)  expansion  and  reformulation  of  the  research 
sparked  by  Robinson  to  include  transfer  effects,  and  Bousfield’s  (1953) 
finding  that  subjects  "cluster”  similar  responses  in  free  recall.  Cur¬ 
rent  indications  that  interest  in  similarity  has  increased  rather  than 
waned  are  a  recent  book  entitled  Paired,  associates  learning:  The  role 
of  meaningfulness,  similarity  and  familiarization  (Goss  and  Kodinc, 

1965)  and  a  chapter  in  a  collection  of  readings  surveying  verbal  learn¬ 
ing  (Kausler,  1966)  devoted  entirely  to  the  topic,  of  effect  of  similarity 
upon  acquisition  of  verbal  material. 

With  this  increasing  concern  for  similarity  as  a  concept  in  verbal 
learning  has  come  a  growing  awareness  of  a  number  of  problems  in  attempt¬ 
ing  to  operationally  define  and  manipulate  similarity.  This  thesis  is 
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concerned  with  an  examination  o£  how  similarity  can  be  specified  for 
verbal  stimuli  and  with  an  experimental  evaluation  of  a  proposed  pro¬ 
cedure  for  scaling  the  similarity  of  verbal  stimuli. 

The  Nature  of  Similarity  in  Psychology 

During  the  last  two  decades  a  number  of  articles  (e.g.  Attneave, 
1950;  Noble,  1957;  Wallach,  1958;  Coombs,  1964)  have  examined  the  concept 
of  similarity  as  used  in  psychology.  This  paper  will  not  attempt  an 
exhaustive  review  of  these  critiques,  nor  will  it  attempt  to  deal  with 
various  mathematical  difficulties  posed  by  attempts  to  measure  similar¬ 
ity;  rather,  the  discussion  here  will  be  restricted  to  a  number  of  more 
or  less  general  points  which  have  been  raised  and  which  are  particularly 
relevant  to  the  measurement  of  similarity  in  verbal  learning.  The 
points  discussed  include  the  distinction  between  psychological  and 
physidal  similarity,  models  of  physical  similarity,  and  the  problem  of 
multidimensionality  in  similarity  relationships. 

Psychological  and  Physical  Similarity 

Probably  the  most  important  distinction  to  be  made  in  this  regard 
is  the  one  between  psychological  and  physical  similarity.  When  a  learn¬ 
ing  theorist  defines  stimulus  generalization  as  the  tendency  of  an 
organism  to  give  a  conditioned  response  to  stimuli  similar  to  the  training 
stimulus,  or  when  a  Gestalt  psychologist  states  that  similarity  of  stimuli 
induces  perceptual  grouping,  it  is  almost  invariably  physical  similarity 
that  is  referred  to.  Physical  similarity  is  determined,  as  Wallach  (1958) 
puts  it,  by  the  "Commonality  of  the  environmental  properties  which  are] 
established  by  the  identity  of  measurement  readings  of  the  centimeters- 
grams-seconds  variety  in  two  situations  bein6  compared"  (p.  107).  In 
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other  words,  physical  similarity  is  defined  by  operations  that  are  only 
incidental  to  a  perceiving  organism.  For  this  reason  Wallach  refers  to 
similarity  defined  in  this  way  as  "potential  similarity",  emphasizing 
that  physical  similarity  is  not  necessarily  related  to  psychological 
similarity. 

Psychological  similarity,  on  the  other  hand,  can  be  loosely  defined  • 
as  the  commonality  of  two  situations  as  it  is  perceived  by  an  organism, 
and  thus  it  must  be  inferred  from  behavior.  Although  psychological  simi¬ 
larity  is  not  independent  of  physical  similarity,  it  can  be  different 
from  physical  similarity  for  at  least  two  reasons.  (1)  An  organism  may 
respond  selectively  to  only  a  few  aspects  of  a  stimulus  situation,  and 
it  need  not  respond  equally  to  those  aspects  that  it  selects.  (2)  The 
responses  that  are  elicited  by  stimuli  in  higher  organisms  are  seldom 
fixed,  as  they  are  often  dependent  on  previous  learning.  Thus,  if  we 
are  to  use  the  concept  of  similarity  in  any  but  the  crudest  of  fashions 
in  psychology,  we  must  be  prepared  to  look  beyond  the  physical  properties 
of  the  stimulus  to  the  organism’s  mediating  responses. 

Models  of  Physical  Similarity 

Even  attempting  to  specify  the  similarity  of  physical  properties 
alone,  however,  presents  problems.  There  would  appear  to  be  two  main 
models  on  which  the  concept  of  physical  resemblance  can  be  patterned. 

The  first  is  known  as  the  common  elements  theory;  it  states  that  two 
situations  resemble  each  other  to  the  extent  that  they  share  identical 
constituent  parts.  The  second  theory,  sometimes  known  as  the  dimensional 
model,  assumes  that  one  stimulus  resembles  another  according  to  their 
proximity  on  some  common  attributive  dimension.  It  can  be  seen  that 
these  models  complement  each  other  to  a  certain  extent.  The  common 
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elements  theory  would  intuitively  seem  most  suited  to  the  specification 
of  the  relationship  between  compound  stimuli  which  can  be  broken  down 
into  easily-specified  constituent  parts  (e.g.  trigrams,  which  can  be 
described  as  combinations  of  letter  elements).  The  dimensional  theory, 
on  the  other  hand,  would  seem  more  suitable  to  the  relating  of  stimuli 
possessing  measurable  properties  such  as  objects  of  a  particular  size, 
hue,  and  brightness.  Unfortunately,  it  is  difficult  to  describe  simple 
criteria,  whether  dimensional  or  elemental,  for  any  but  the  simplest  of 
stimuli.  This  brings  us  to  the  problem  of  determining  similarity  re¬ 
lations  for  stimuli  that  differ  on  a  number  of  attributes. 
Multidimensionality  of  Similarity  Relations 

James  (1890)  has  pointed  out  that  the  moon  is  considered  as 
similar  to  both  a  gas-jet  and  a  football,  but  that  the  gas-jet  in  no 
way  resembles  the  football.  This  is  because  the  resemblance  between 
moon  and  gas-jet  is  mediated  by  their  common  property  of  luminosity, 
while  football  and  moon  share  the  attribute  of  rotundity;  however,  James 
states,  gas-jet  and  football  have  no  properties  in  common.  James'  ex¬ 
ample  is  perhaps  exaggerated  —  gas-jets  must  have  some  recognizable  shape, 
and  a  football  must  reflect  a  certain  amount  of  light  to  be  seen.  Never¬ 
theless,  James  emphasizes  the  importance  of  an  aspect  of  similarity  that 
has  not  always  been  recognized;  that  it  is  a  multidimensional  relationship. 

V 

Noble  (1957)  points  out  that  similarity  is  not  a  dimension  along 
which  single  stimuli  can  be  scaled,  a  distinction  that  some  researchers 
(e.g.  liaagen,  1949)  have  not  always  made  clear.  Rather,  it  is  an  intrans¬ 
itive  relationship  between  two  objects;  A  may  be  similar  in  some  degree 
to  B,  and  B  may  be  similar  to  C,  but  these  statements  tell  us  nothing  about 
the  similarity  of  A  to  C.  This  is  because  the  similarity  between  A  and  B 
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can  be  mediated  by  those  attributes  or  elements  that  A  and  B  have  in 
common,  while  the  resemblance  between  B  and  C  might  be  due  to  a  set  of 
attributes  or  elements  that  include  all,  some,  or  none  of  those  common 
to  A  and  B.  This  multidimensional  aspect  of  similarity  must  be  con¬ 
sidered  in  any  attempt  to  manipulate  similarity  experimentally. 

In  summary,  previous  discussions  of  the  general  nature  of  the 
concept  of  similarity  in  psychological  theory  have  pointed  out  a  number 
of  areas  where  clear  definitions  of  terms  and  relationships,  although 
difficult,  are  necessary.  These  are  the  operational  definition  of 
physical  similarity,  the  connection  between  physical  and  psychological 
similarity,  and  the  multidimensional  nature  of  the  similarity  re¬ 
lationship  . 

lleasures  of  Similarity  Used  Previously  in  Verbal  Learning 

A  wide  variety  of  models  of  similarity,  both  physical  and  psycho¬ 
logical,  have  been  used  previously  in  verbal  learning  studies.  The 
procedures  used  to  scale  similarity  have  also  varied  widely,  ranging 
from  direct  rating  techniques  wTiere  subjects  are  instructed  to  provide 
judgments  of  similarity  through  to  what  i’lavell  (1961a)  terms  "functional 
equivalence"  measures.  In  the  latter  the  similarity  of  the  two  stimuli 
is  defined  as  the  degree  to  which  one  stimulus  can  apparently  replace 
another  in  some  stimulus-response  relationship.  Examples  of  functional 
equivalence  measures  typically  involve  phenomena  such  as  generalization, 
transfer  of  training,  clustering  in  free  recall,  and  interference.  Al¬ 
though  the  functional  equivalence  measures  have  less  intuitive  appeal  as 
indicants  of  similarity  than  other  procedures  such  as  direct  rating,  they 
are  important  not  only  because  they  have  occasionally  been  used  directly 
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as  a  defining  operation  for  stimulus  similarity  (e.g.,  Gibson,  1940), 
but  also  because  functional  equivalence  measures  have  frequently  been 
used  as  a  criterion  by  which  to  evaluate  other  types  of  similarity 
measure. 

This  section  reviews,  to  the  writer's  knowledge,  all  the  major 
methods  used  previously  to  scale  similarity  among  verbal  stimuli.  The 
studies  cited,  however,  are  intended  to  illustrate  the  scaling  proced¬ 
ures  rather  than  exhaustively  review  all  previous  research.  The  measures 
described  include  three  variations  of  a  common  elements  model  (with 
letters,  phonemes  and  associations  as  elements),  direct  scaling  of  stim¬ 
ulus  pairs,  latency  of  similarity  judgments,  measures  involving  number 
of  similarity  criterion  responses,  estimates  of  the  co-occurrence  of  the 
referents  of  meaningful  words,  and  the  semantic  differential  scaling 
technique,  all  of  which  involve  direct  ratings  of  stimuli.  Also  des¬ 
cribed  are  a  number  of  measures  based  on  functional  equivalence  of 
stimuli. 

Common  Elements 

Three  scaling  methods  are  based  on  the  assumption  that  stimulus 
similarity  is  mediated  by  common  elements.  Two  of  these  are  measures 
of  the  physical  similarity  of  stimuli,  considering  letters  and  phonemes 
as  elements,  while  the  third  measures  psychological  similarity  by  con¬ 
sidering  as  elements  association  responses  to  stimuli. 

Letter  elements.  Underwood  has  used  the  technique  (e.g.,  Feldman 
and  Underwood,  1957)  of  constructing  lists  of  trigram  stimuli  from  a 
limited  number  of  letter  elements,  with  similarity  due  to  element  dup¬ 
lication  increasing  with  decreasing  numbers  of  constituent  letters. 

Abbott  and  Price  (1964)  used  a  3-letter  nonsense  syllable  as  a  training 
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stimulus  in  an  eyeblink  conditioning  procedure,  then  tested  subjects' 
responses  to  the  same  syllable  and  to  nonsense  syllables  with  two,  one, 
or  no  letters  in  common  with  the  training  stimulus.  In  a  more  sensitive 
form  of  this  procedure,  Runquist  and  Joinson  (in  press)  varied  not  only 
number  of  common  elements  between  pairs  of  nonsense  syllables  but  also 
the  position  of  the  repeated  letters  in  the  sequence. 

Phoneme  elements.  Conrad  (1962,  1964)  and  Wickelgren  (1965, 

1966)  have  studied  the  effects  of  acoustic  similarity  of  letters  and 
digits  as  measured  by  common  phonemes  in  experiments  where  material  was 
presented  visually  or  aurally.  Strictly  speaking,  this  is  a  measure  of 
physical  similarity  for  aurally-presented  stimuli  only;  the  phonemic 
similarity  of  visually-presented  verbal  stimuli  is  mediated  by  a  learned 
vocal  response.  However,  for  purposes  of  this  analysis  it  would  seem  to 
do  no  harm  to  consider  measures  of  phonemic  similarity  as  being  essen¬ 
tially  equivalent  for  visually  and  aurally-presented  stimuli. 

Associative  response  elements.  The  third  common-elements  method 
is  a  measure  of  psychological,  rather  than  physical,  similarity.  It 
treats  as  constituent  elements  of  a  stimulus  the  association  responses 
elicited  by  it  in  an  association  test.  The  basic  procedure  is  to 
calculate  the  degree  to  which  association  responses  to  two  stimuli 
overlap;  e.g.,  stimuli  A  and  B  might  elicit  few  or  no  association  re¬ 
sponses  in  common,  indicating  that  their  similarity  is  low,  while  stimuli 
C  and  D  might  elicit  many  of  the  same  responses  (including  each  other), 
indicating  high  similarity.  It  is  beyond  the  scope  of  this  paper  to 
describe  the  large  number  of  different  procedures  that  exist  for  cal- 
culating  an  index  of  similarity  by  this  method  (Marshall  and  Cofer,  1963; 
Goss  and  Nodine,  1965;  Flavell  and  Johnson,  1961);  however,  the  principle 
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of  overlap  or  commonality  of  associative  responses  is  essentially  the 
same  for  all  of  them. 

Direct  Scaling 

In  a  number  of  studies  subjects  have  directly  rated  the  similar¬ 
ity  of  pairs  of  stimuli  by  various  scaling  methods.  McGeoch  and 
McDonald  (1931)  had  judges  place  60  pairs  of  synonyms  into  three  equal 
groups  on  the  basis  of  decreasing  similarity,  and  then  picked  the  ten 
most  consistently  judged  pairs  from  each  group  to  represent  three  degrees 
of  intra-pair  similarity.  Haagen's  ( 1 9 A 9 )  subjects  used  a  seven-point 
scale  with  similarity  defined  as  Mthe  extent  to  which  words  denote  the 
same  or  similar  objects,  actions,  or  conditions"  (p.  454).  The  stimuli 
rated  were  common  two-syllable  adjectives,  previously  arranged  in  sets 
of  six  as  being  similar  in  dictionary  definition.  One  member  of  each 
set  was  selected  as  a  focus,  and  median  ratings  of  the  similarity  of 
this  focal  word  to  the  other  five  members  were  calculated  for  80  sub¬ 
jects.  In  a  study  that  rated  trigrams  of  various  types  and  association 
values  rather  than  meaningful  words,  Runquist  and  Joinson  (in  press) 
had  subjects  give  direct  magnitude  estimates  of  numbers  between  0  and 
100  to  represent  the  similarity  of  pairs  of  stimuli.  Garskof  and  Houston 
(1963)  used  a  similar  procedure  for  scaling  pairs  of  meaningful  words 
except  that  their  subjects  responded  by  marking  an  "X"  on  a  5-inch  line 
vTith  ends  marked  0  and  1 . 

Latency 

Flavell  (Flavell,  1961b;  Flavell  and  Johnson,  1961)  presented 
subjects  with  pairs  of  stimuli  and  asked  them  to  think  of  some  way  in 
which  the  stimuli  were  similar.  The  time  period  between  presentation 
of  the  pair  and  the  subject's  report  of  some  basis  for  perceiving  the 


9 


stimuli  as  similar  was  used  as  an  inverse  measure  of  similarity. 

Production  of  Similarity  Criteria 

Flavell  and  Johnson  (1961)  had  subjects  write  as  many  similar¬ 
ities  between  a  given  pair  of  stimuli  as  they  could  in  one  minute.  Sub¬ 
jects  were  told  that  the  similarities  need  not  be  familiar  or  logical, 
but  should  be  "genuine  ways  in  which  the  referents  were  alike"  (p.  339). 

A  number  of  indices  were  derived  from  these  data  and  also  from  the  data 
of  the  single  dominant  responses  reported  in  the  latency  procedure 
described  above.  All  of  these  indices  estimated  similarity  either 
through  number  of  criteria  produced  per  stimulus-pair  or  through  measures 
of  stereotypy  of  criteria  between  subjects. 

Co-occurrence  of  Referents 

Flavell  (Flavell,  1961b;  Flavell  and  Johnson,  1961)  had  subjects 
give  probability  ratings  of  the  co-occurrence  of  the  referents  of  mean¬ 
ingful  words  (adjectives,  concrete  nouns,  and  abstract  nouns)  in  the 
environment.  These  probability  estimates  were  used  as  measures  of  the 
semantic  similarity  of  the  pairs. 

Semantic  Differential 

The  semantic  differential  technique  (Osgood,  1952;  Osgood,  Suci, 
and  Tannenbaum,  1957)  was  originally  constructed  as  a  method  for  measuring 
the  meaning  of  single  stimuli.  Based  on  Osgood’s  (1953)  mediating  re¬ 
sponse  theory  of  meaning,  it  assumes  that  the  meaning  of  a  verbal  stim¬ 
ulus  can  be  specified  by  its  position  on  a  number  of  scales  defined  by 
bipolar  adjectives,  e.g.,  "fast-slow",  "good-bad",  "strong-weak".  Factor- 
analytic  studies  have  shown  that  most  of  the  variance  of  the  individual 
adjective-scales  could  be  accounted  for  by  three  main  semantic  factors. 
These  factors  have  been  used  by  Osgood  and  his  associates  to  describe  a 
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three-dimensional  semantic  space,  within  which  the  meaning  of  verbal 
symbols  could  be  specified  by  their  projections  on  the  three  sets  of 
defining  coordinates.  Assuming  that  this  semantic  space  is  Euclidean, 
similarity  of  meaning  of  two  stimuli  can  be  equated  with  the  distance 
between  the  two  points  representing  the  two  stimuli  as  determined  by 
the  generalized  Pythagorean  theorem  (Osgood  and  Suci,  1952). 

Functional  Equivalence 

This  approach  attempts  to  specify  the  similarity  of  two  stimuli 
by  measuring  the  extent  to  which  one  stimulus  can  replace  the  other  in 
a  behavioral  relationship.  In  other  words,  similarity  is  scaled,  not 
directly,  but  by  observing  some  aspect  of  behavior  that  has  a  well- 
established  theoretical  relationship  with  similarity.  Examples  of 
phenomena  through  which  this  method  has  been  tested  are  stimulus 
generalization  in  classical  and  instrumental  conditioning,  transfer  of 
training,  intra-list  interference  in  acquisition  and  clustering  in  free 
recall. 

Stimulus  generalization  in  classical  conditioning.  Razran  (1949) 
classically  conditioned  a  salivation  response  to  a  meaningful  word,  and 
then  measured  the  amount  of  conditioned  response  to  test  words  of  varying 
degrees  of  semantic  and  phonemic  similarity.  Riess  (1940)  measured 
generalization  of  a  conditioned  galvanic  skin  reflex  to  meaningful  words, 
while  Abbott  and  Price  (1964)  conditioned  an  eyeblink  response  to  a  tri¬ 
gram,  then  tested  subjects'  responses  to  the  training  stimulus  and  other 
trigrarns . 

Stimulus  generalization  in  instrumental  conditioning.  Dicken 
(1961)  trained  an  .instrumental  lever-pulling  response  to  a  set  of  words, 
then  tested  subjects  with  another  group  of  words  and  noted  the  frequency 
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of  response  to  each.  Postman  (1951)  presented  subjects  with  a  number  of 
6-letter  nonsense  bi-syllables,  then  had  them  attempt  to  recognize  the 
previously-experienced  stimuli  from  among  a  larger  group  of  bi-syllables. 
Similarity  of  a  given  class  of  bi-syllable  was  determined  by  its  relative 
frequency  of  being  "recognized". 

Transfer  of  training.  A  typical  study  in  which  similarity  was 
measured  by  transfer  was  done  by  Bastian  (1961).  Subjects  learned  first 
one  paired-associate  list,  then  learned  a  second  list  containing  identi¬ 
cal  stimuli  and  different  responses,  followed  by  additional  practice  on 
the  first  list.  Similarity  of  the  response  members  was  indicated  by  the 
amount  of  positive  transfer  associated  with  each  response.  A  similar 
design  was  used  by  Ryan  (1960),  except  that  the  similarity  of  the  stim¬ 
ulus  members  rather  than  the  response  members  was  measured. 

Intra-list  interference  to  paired-associates  learning.  Wimer 
(1963)  estimated  the  average  similarity  of  groups  of  meaningful  words 
through  intra-list  interference  by  using  them  as  stimuli  in  paired- 
associates  lists  and  determining  the  mean  number  of  trials  that  subjects 
required  to  learn  each  list.  Feldman  and  Underwood  (1957),  in  addition 
to  measuring  average  similarity  of  stimulus  and  response  members  in 
paired-associates  lists  by  trials  to  criterion,  also  used  mean  overt 
errors  per  trial  as  an  additional  measure  of  similarity. 

Clustering  in  free  recall  of  lists.  Bousfield,  Whitmarsh  and 
Berkowitz  (1960)  measured  the  frequency  of  co-occurrence  of  pairs  of 
words  in  free  recall  after  the  list  of  words  had  been  presented  for 
learning  in  a  randomized  list.  "Clustering"  was  used  as  a  measure  of 
similarity . 

In  conclusion,  a  review  of  measures  of  similarity  relations  among 
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verbal  stimuli  has  been  presented.  It  was  shown  that  under  one  classifica¬ 
tion  system  the  measures  could  be  classed  according  to  the  assumptions 
underlying  the  experimental  operation  by  which  similarity  was  specified. 

These  assumptions  can  be  described  dichotomously :  those  measures  based  on 
physical  similarity  and  those  based  on  psychological  similarity;  and  dim¬ 
ensional  versus  common  element  measurement  procedures.  An  alternative 
classification  system  groups  measures  of  similarity  into  three  classes 
according  to  the  type  of  measuring  operation;  those  based  on  intrinsic 
properties  of  the  stimulus  (the  letter  and  phoneme  common  element  measures), 
those  based  on  subjects'  direct  responses  to  the  stimuli  (associative  over¬ 
lap,  direct  scaling,  latency,  production  of  similarity  criteria,  and  the 
semantic  differential),  and  measures  based  on  subjects'  responses  to  stim¬ 
uli  when  they  are  part  of  a  more  complex  task  (the  functional  equivalence 
measures) . 

The  Locus  of  Similarity 

From  the  above  review  it  would  seem  that  the  majority  of  similar¬ 
ity  measures  have  implicit  in  them  the  assumption  that  similarity  relations 
between  stimuli  can  be  studied  directly  through  the  properties  of  the  verbal 
stimuli  themselves.  Some  measures  presumed  that  the  similarity  of  stimuli 
could  be  specified  in  terras  of  their  physical  characteristics.  Other 
measures  were  based  on  phychological  similarity;  i.e,,  as  mediated  by  the 
properties  of  responses  elicited  by  the  stimuli.  However,  most  measures  of 
psychological  similarity  seem  to  assume  that  the  response  to  a  given  stimulus 
is  sufficiently  stable  and  stereotyped  within  cultural  groups  that  common 
standards  may  be  applied  to  individuals  within  the  group  (e.g.,  Osgood,  1  53) . 
Despite  the  fact  that  Osgood  (1962)  reports  finding  inter-individual  differ- 
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enccs  in  semantic  differential  ratings  of  words  within  cultural  groups, 
most  studies  have  apparently  tacitly  assumed  that  differences  in  semantic 
structure  between  individuals  can  safely  be  ignored  in  studies  of  similarity. 
Garskof  and  Houston  (1963)  showed  that  a  measure  of  similarity  based  on 
individual  subjects'  data  gives  a  more  reliable  relationship  with  stimulus 
generalization  than  does  a  measure  based  on  pooled  data  from  many  subjects. 
However,  their  study  seerns  to  be  a  lone  exception  to  the  rule  that  individual 
differences  have  received  little  attention  in  the  study  of  similarity  re- 
lations  between  verbal  stimuli. 

It  is  suggested  that  the  shift  in  emphasis  from  the  study  of  sim¬ 
ilarity  through  the  invariant  properties  of  stimuli  to  the  study  of  the 
responses  of  the  individual  organism  to  the  stimuli  would  greatly  clarify 
the  problem  of  individual  differences.  This  approach  was  suggested  by 
Osgood  in  1953,  but  does  not  seem  to  have  been  seriously  or  consistently 
considered  in  verbal  learning  studies,  even  by  Osgood  himself.  Briefly, 
it  is  presumed  that  an  environmental  stimulus  can  produce  a  mediating 
response  in  the  organism's  central  nervous  system.  The  nature  of  this  re¬ 
sponse  is  determined  by  a  number  of  factors — the  sensory  processes  involved, 
the  organism's  previous  experience  with  the  stimulus,  and  the  aspects  of 
the  stimulus  being  attended  to,  to  mention  only  a  few.  These  mediating 
responses  have  stimulus  properties  to  which  overt  responses  can  be  attached, 
resulting  in  such  phenomena  as  stimulus  generalization  and  transfer  of 
learning  as  a  function  of  the  similarity  of  the  mediating  responses. 

This  mediating  response  hypothesis  redefines  the  dichotomy  between 
primary  and  mediated  or  secondary  generalization;  it  states  that  all  general¬ 
ization  is  mediated  in  that  it  is  a  function  of  the  properties  of  responses 
in  the  central  nervous  system  to  the  stimuli.  According  to  this  view  what 
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is  called  primary  generalization  occurs  when  the  mediating  response  is 
determined  directly  by  the  sensory  processes  elicited  by  the  stimuli, 
while  secondary  generalization  occurs  when  the  mediating  response  to  the 
sensory  process  is  modified  by  learning.  Thus,  psychological  similarity 
can  be  studied  at  three  stages.  First,  we  may  study  the  events  antecedent 
to  the  mediating  responses,  including  the  physical  properties  of  the  stim- 
uli,  the  organism’s  prior  experience  with  the  stimuli,  and  how  it  attends 
to  the  stimuli.  Second,  the  mediating  responses  themselves  may  be  examined; 
and  third,  the  consequent  behavior  involving  these  mediating  responses  can 
be  measured. 


It  can  be  seen  that  some  of  the  measures  of  similarity  reviewed 

above  can  be  arranged  on  a  continuum,  the  extremes  of  which  represent  the 

antecedent  and  consequent  links  of  these  mediating  responses.  Various 

measures  differ  in  that  they  attempt  to  measure  similarity  at  different 

points  on  the  continuum.  The  phoneme  and  letter  common-element  measures 

tap  the  similarity  process  at  the  antecedent  end  by  assessing  the  physical 

characteristics  of  single  stimuli  and  then  postulating  the  nature  of  the 

psychological  similarity  relationship.  In  moving  toward  the  consequent 

end  we  next  encounter  the  associative  overlap  and  semantic  differential 

measures.  Both  these  measures  first  study  what  might  be  considered  as  the 

properties  of  mediating  responses  to  single  stimuli  and  then  specify  hypo- 

* 

thetical  similarity  relationships.  At  the  consequent  end  of  the  dimension 
ve  have  the  functional  equivalence  measures  which  assess  the  degree  to 
which  one  stimulus  can  substitute  for  another  in  stimulus-response  relation¬ 
ships  . 

The  remaining  scaling  techniques  (direct  scaling,  latency  measures, 
similarity  criteria  production,  and  referent  co-occurrence)  do  not  fit  as 


15 


obviously  into  this  continuum  as  do  the  others.  These  measures  have  one 
aspect  in  common  that  differentiates  them  from  other  similarity  measures; 
namely,  that  similarity  is  scaled  by  the  strength  of  a  response  elicited  by 
a  pa ir  of  stimuli  presented  to  a  subject.  The  other  measures  present  single 
stimuli  to  the  subject  and  relate  the  independently  elicited  responses  by 
some  indirect  procedure,  so  that  the  measure  of  similarity  is  derived  from 
the  data.  In  the  four  techniques  listed  above,  however,  the  similarity  of 
a  pair  of  stimuli  is  directly  inferred  from  the  subject's  response  to  that 
pair.  One  of  these  measures,  co-occurrence  of  referents,  assumes  that  sim¬ 
ilarity  of  verbal  symbols  is  caused  (at  least  partly)  by  previous  associa¬ 
tion  of  the  referents  of  these  symbols  in  the  subject's  environment  and  is 
an  attempt  to  scale  the  degree  of  this  association.  Eecause  of  this  feature 
co-occurrence  of  referents  is  probably  best  considered  as  a  variety  of 
associative  overlap  measure.  However,  the  other  three  measures  entail  no 
assumptions  as  to  the  causes  of  similarity  relations  among  verbal  stimuli. 
These  measures  (direct  scaling,  latency,  and  criteria  production)  are  con¬ 
cerned  with  similarity  as  an  empirical,  rather  than  as  a  theoretical,  func¬ 
tion.  They  assume  only  that  the  psychological  similarity  of  two  stimuli 
can  be  scaled  directly  by  subjects  in  much  the  same  way  as  attitudes  or  pre¬ 
ferences,  and  thus  are  best  considered  as  operational  definitions  of  sim¬ 
ilarity  . 

This  leads  to  the  central  point  of  this  thesis,  I)o  these  opera¬ 
tional  measures  of  similarity,  especially  direct  scaling,  which  are  essen¬ 
tially  attempts  to  psychophysically  scale  subjects'  subjective  estimates 
of  similarity,  offer  any  advantage  to  the  psychologist  that  is  not  found 
in  the  antecedent  and  consequent  measures  of  similarity?  In  terms  of  exper¬ 
imental  convenience  and  of  relating  the  theoretical  construct  to  the  in- 
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tuitive  concept  of  similarity,  the  answer  would  appear  to  be  yes.  It  is 
obvious  that  a  simple  scaling  procedure  for  determining  similarity  relations 
would  represent  a  saving  of  effort  and  parsimony  relative  to  procedures 
such  as  the  antecedent  and  consequent  measures  of  similarity  described 
earlier.  And  the  direct  measures  of  similarity  have  the  added  attractive¬ 
ness  that  they  are  direct  manifestations  of  human  observers’  standards  of 
similarity.  In  discussing  what  should  be  the  primary  data  for  similarity 
relations,  Coombs  has  stated 

It  has  always  seemed  self-evident  to  me  that  the  observa¬ 
tions  should  be  verbal  judgments  of  similarity.  We  could, 
of  course,  utilize  a  transfer  experiment  and  observe  changes 
in  amplitude  or  latency  and  use  such  observations  to  measure 
similarity.  Even  if  this  were  done,  however,  the  ultimate 
criterion  for  accepting  the  measure  as  a  measure  of  similarity 
would  be  subjective.  It  seems  to  me  that  verbal  judgments 
should  be  used  to  construct  a  measure  of  similarity,  and 
then  psychological  theory  would  revolve  around  the  functions 
that  would  relate  similarity  to  other  behavior  phenomena, 

(Coombs,  1964 ,  pp,  436-437) . 

Granted  that  the  empirical  measures  of  similarity  have  some 
intuitive  appeal,  it  must  novT  be  shown  that  they  are  also  useful.  That  is, 
it  must  be  shown  that  direct  ratings  of  similarity  are  reliably  related  to 
other  events;  specifically,  to  the  antecedent  conditions  that  presumably 
cause  similarity  and  to  the  consequent  behavior  that  is  affected  by  similar¬ 
ity,  A  number  of  experiments  indicate  that  empirical  measures  of  similarity 
are  related  to  antecedent  and  consequent  measures  of  similarity.  Repre¬ 
sentative  studies  of  these  relationships  will  be  reviewed  in  the  following 
sections,  together  with  examples  of  studies  relating  different  empirical 
measures  of  similarity. 

Relations  between  A n t eceden t  and  Direct  Measures  of  Similarity 


In  this  section  some  representative  studies  relating  antecedent 


and  direct  measures  of  similarity  are  summarized. 


They  have  been  grouped 


* 
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according  to  whether  the  antecedent  measures  were  derived  from  physical, 
semantic  differential,  or  associative  commonality  techniques  of  scaling 
similarity . 


Physical  measures  of  similarity.  Runquist  and  Joinson  (in  press) 
had  subjects  rate  pairs  of  nonsense  syllables  for  similarity.  They  found 
that  similarity  ratings  could  be  predicted  from  the  number-  and  position  of 
shared  letter  elements  in  the  pairs.  Using  two  different  sets  of  nonsense 
syllables  their  data  showed  the  mean  similarity  ratings  yielded  a  correla¬ 
tion  of  rho  =  .977  when  pairs  constructed  according  to  the  same  principles 
were  compared. 

Semantic  differentia],  measures  of  similarity .  F 1 a v e 1 1  (1961b) 
found  correlation  coefficients  between  judged  similarity  and  similarity  as 
determined  by  the  semantic  differential  ranging  from  _r  =  .86  for  pairs  of 
adjectives  to  r  =  .40  for  adjective-concrete  noun  pairs.  Winter  (1963) 
found  that  the  correlation  between  judged  similarity  of  nouns  and  semantic 
differential  similarity  (summation  across  20  scales)  was  _r  =  .347. 

Associative  commonality  measures  of  similarity.  Winter  ( 1  963) 
obtained  correlations  between  judged  similarity  of  noun-pairs  and  four 
measures  of  associative  overlap.  Two  of  these  correlations  were  significant, 
those  with  "tota]  associative  overlap"  (jr  =  .405)  and  "associative  reciprocity 
(r.  “  *434),  while  the  correlations  with  "associative  overlap  within  individ¬ 
uals"  and  "variety  of  associative  overlap"  had  moderate  but  nonsignificant 
correlations  with  ratings  of  similarity.  Haagen  (1949)  found  a  correlation 
coefficient  of  r  =  .90  between  ratings  of  pairs  of  adjectives  for  "closeness 


of  associative  connection"  and 


"similarity  of  meaning". 


The  former  measure 


t! 


differs  front  most  scales  of  associative  commonality  in  that  subjects  rated 
pairs  of  stimuli  instead  of  producing  associations  to  single  stimuli.  It 
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is  possible  that  Haagen’s  high  correlation  between  associative  commonality 
and  rated  similarity  is  due  to  subjects’  operating  on  rated  similarity  of 
meaning  as  an  indicant  of  associative  connection.  However,  Cofer  (1957) 
obtained  a  more  orthodox  associative  overlap  index  (’’mutual  frequency”)  for 
pairs  of  Kaagen's  stimuli.  Although  Cofer  did  not  report  a  correlation 
between  Haagen's  (1949)  similarity  ratings  and  his  own  associative  common¬ 
ality  scores,  it  would  appear  from  his  published  results  (Cofer,  1957,  p, 
605)  that  a  correlation  of  rho  =  .915  between  rated  similarity  and  associ¬ 
ative  overlap  can  be  calculated  from  his  grouped  data. 

An  important  experiment  was  that  of  Garskof  and  Houston  (1963) 
in  which  two  measures  of  associative  overlap  were  correlated  with  each 
other  and  with  rated  similarity  of  pairs  of  nouns.  One  measure  of  associ¬ 
ative  commonality  was  "mutual  frequency",  used  previously  by  Cofer  (1957). 
This  measure  involves  only  the  first  association  response  given  by  a  sub¬ 
ject  to  a  stimulus,  and  requires  group  data  for  the  calculation  of  an 
index.  The  second  measure  of  associative  commonality  used  by  Garskof  and 
Houston  was  the  "relatedness  coefficient",  which  involves  a  weighting  of 
the  sequence  of  association  responses  given  by  a  subject  to  a  stimulus. 

This  second  measure  can  be  used  to  calculate  indices  of  associative  overlap 
for  individual  subjects.  Garskof  and  Houston  found  correlations  between 
the  "relatedness  coefficient"  and  rated  similarity  of  24  word  pairs  ranging 
from  rho  =  .63  to  rho  =  .94  among  20  subjects,  all  significant  at  the  .01 
level  of  significance.  The  rank  order  correlation  over  all  20  subjects 
between  the  "relatedness  coefficient"  and  rated  similarity  was  rho  =  .94. 

In  contrast,  "mutual  frequency"  for  13  of  the  24  word  pairs  as  calculated 
from  group  data  was  0,  and  the  correlation  between  "mutual  frequency"  and 
rated  similarity  for  the  11  pairs  which  did  ootain  a  mutual  frequency 
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greater  than  0  was  negative  and  nonsignificant.  Correlation  between 
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relatedness  coefficient"  and  rated  similarity  for  the  same  11  pairs  was 
rho  =  .97.  This  is  apparently  the  only  published  study  in  which  measures 
of  similarity  based  on  data  from  individual  subjects  have  been  calculated. 
Garskof  and  Houston  suggest  that  the  discrepancy  between  their  nonsignifi¬ 
cant  results  and  Cofer's  (1957)  significant  findings  concerning  the  relation 
between  "mutual  frequency"  and  rated  similarity  may  be  due  to  the  different 
numbers  of  subjects  employed  (20  and  356,  respectively).  Nevertheless ,  this 
experiment  would  seem  to  support  the  contention  that  similarity  relations 
may  be  measured  more  accurately  through  single  subjects  than  through  con¬ 
sidering  pooled  data  from  a  group  of  subjects. 

Relations  jhetween  Direct  and  Consequent  Measures  of  Similarity 

In  this  section  v.Till  be  reviewed  representative  studies  relating 
direct  measures  of  similarity  to  consequent  measures  of  similarity.  These 
studies  will  be  grouped  according  to  the  types  of  behavioral  phenomena 
that  X'/ere  used  as  consequent  measures  of  similarity;  namely,  intralist 
effects  in  paired-associates  learning,  interlist  transfer,  generalization 
of  semantic  conditioning,  and  confusions  in  short-term  memory. 

Intralist  effects  in  paired-associates  learning.  A  number  of 
studies  have  shown  that  rated  similarity  between  stimuli  or  responses  can 
affect  various  aspects  of  the  acquisition  of  a  paired-associates  list. 

Some  representative  studies  involving  stimulus  similarity,  response  similar¬ 
ity,  and  a  "concept-learning"  design  with  similar  stimuli  and  repeated  re¬ 
sponses  will  be  described. 

In  a  typical  experiment  studying  stimulus  similarity  Underwood 
(1953)  had  subjects  learned  paired-associates  lists  whose  stimulus  members 
varied  in  similarity  according  to  Faagen's  (1949)  norms.  He  found  a  complex 
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relationship  between  stimulus  similarity  and  rate  of  learning,  with  medium- 
similarity  lists  taking  longest  to  learn.  Wimer  (1963)  constructed  six-pair 
lists  composed  of  six  low-association  responses  used  in  all  lists  and  32 
common-noun  stimuli,  each  stimulus  used  in  six  lists.  She  recorded  the 
mean  number  of  trials  to  master  each  list  to  one  errorless  trial  and  the 
mean  similarity  rating  for  all  pairs  of  stimuli  on  each  list.  The  correla¬ 
tion  between  trials  to  criterion  and  mean  list  similarity  was  _r  =  .428. 

Among  experiments  varying  rated  similarity  of  response  members 
in  paired-associates  lists  was  one  by  Higa  (1963).  lie  found  that  a  list 
containing  as  responses  pairs  of  words  rated  as  synonyms  was  more  difficult 
to  learn  than  a  control  list.  Underwood  (1953)  found  that  lists  containing 
similar  responses  produced  rates  of  overt  errors  during  acquisition  that  were 
proportional  to  the  degree  of  response  similarity. 

Richardson  (1958)  had  subjects  learn  a  number  of  16-pair  lists  con¬ 
sisting  of  two  groups  of  eight  stimulus  members  with  high  intragroup  rated 
similarity  in  all  lists.  The  lists  varied  as  to  the  number  of  different 
response  members  that  were  used  and  also  as  to  whether  repeated  responses 
were  paired  with  similar  or  dissimilar  stimuli.  In  this  way  learning  a  list 
with  repeated  responses  to  similar  stimuli  was  analogous  to  learning  a  con¬ 
cept.  It  was  found  that  pairing  a  repeated  response  to  similar  stimuli 
produced  positive  intralist  transfer. 

Interlist  transfer.  A  number  of  studies  have  shown  that  rated 
similarity  of  response  members  between  lists  causes  positive  transfer  of 
training  from  first  to  second  list.  Bastian  (1961)  found  significant  trans¬ 
fer  effects  in  second  list  learning  when  pairs  of  response  items  in  the  two 
lists  were  judged  to  be  semantically  similar  by  a  direct  scaling  technique. 
Both  Underwood  (1951)  and  Morgan  and  Underwood  (1950)  found  that  interlist 


■  >2 


21 


response  similarity  defined  by  Haagen's  (1949)  norms  caused  facilitation  of 
second-list  learning  and  instrusions  of  first-list  responses  during  learn¬ 
ing  of  the  second  list.  Slamecka  (1967)  showed  that  rated  synonymity  of 
responses  in  two  lists  causes  positive  transfer  in  both  mixed  and  unmixed 
list  designs. 

Generalization  of  semantic  conditioning.  Several  studies  have 
shown  that  when  a  response  has  been  trained  to  a  verbal  stimulus,  general¬ 
ization  of  responding  is  shown  to  other  stimuli  judged  similar  to  the 
training  stimulus.  This  has  been  shown  for  both  classical  and  instrumental 
conditioning  procedures . 

Typical  of  the  classical  conditioning  studies  was  Razran’s  (1939) 
in  which  a  salivation  response  was  conditioned  to  four  words.  Subjects 
were  then  tested  for  response  to  synonyms  of  the  training  stimuli.  He 
found  significant  generalization  of  responding.  Riess  (1940)  replicated 
Razran’s  study  using  the  galvanic  skin  response  instead  of  salivation,  with 
comparable  results, 

Kurcz  (1964)  ins trumentally  conditioned  a  key-pressing  response 
to  word  stimuli.  She  found  that  subjects  showed  generalization  to  synonomous 
words , 

Confusions  in  short- tern  memory .  baadeley  (1966)  aurally  presented 
sequences  of  words  to  subjects  who  then  wrote  dov:n  the  sequences  from  memory 
after  short  delays.  He  found  that  using  sequences  of  words  that  were  judged 
sinilar  to  each  other  produced  a  small  but  reliable  decrement  in  recall  of 
the  correct  sequence. 

Relation a  between  Different  Empirical  Measures  of  Similarity 

A  few  studies  have  shown  that  different  empirical  measures  of 
si ailarity  are  related.  llavell  and  Johnson  (19ul)  found  judged  similarity 
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of  concrete  nouns  correlated  _r  =  ,57  with  production  of  similarity  criteria 
and  r_  -  ,70  with  latency,  indicating  significant  relationships.  Attneave 
(1951)  found  a  correlation  of  jr  =  ,91  between  judged  similarity  of  concrete 
nouns  and  criteria  production.  There  is  some  evidence,  then,  that  judged 
similarity  is  related  to  the  other  direct  measures  of  similarity.  It  should 
be  noted  that  there  is  a  large  difference  between  Attneave’ s  (1951)  and  Fla— 
veil  and  Johnson's  (1961)  reported  correlation  between  judged  similarity 
and  production  of  similarity  criteria.  Flavell  and  Johnson  point  out  that 
their  experiment  used  two  independent  groups  for  the  two  measures  of  similar 
ity  whereas  Attneave  had  the  same  subjects  undergo  both  procedures.  This 
supports  the  hypothesis  that  similarity  relations  may  be  idiosyncratic  to 
a  given  subject. 

Summary  and  Conclusion 

It  has  been  suggested  that  similarity  is  better  considered  as  a 
relationship  between  internal  representations  of  external  events  than  as  a 
direct  relation  between  the  external  events  themselves.  Although  schemes 
have  been  proposed  by  which  environmental  events  or  objects  are  considered 
as  similar  or  dissimilar  according  to  their  intrinsic  properties,  it  has 
been  shown  that  usually  some  criterion  must  be  arbitrarily  picked  to  deter¬ 
mine  which  properties  of-  stimuli  will  be  used  to  determine  their  similarity. 
In  addition  stimuli  are  often  judged  as  similar,  not  on  their  own  intrinsic 
properties,  but  according  to  properties  of  responses  that  are  elicited  in 
the  observer. 

If  similarity  is  considered  in  this  way,  then  an  illuminating 
distinction  can  be  drawn  between  the  different  types  of  similarity  measures 
used  in  verbal  learning.  Three  classes  of  similarity  measure  have  been 
proposed  in  this  paper.  The  first  type  studies  the  properties  of  environ- 
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mental  events  and  the  operations  that  cause  two  stimuli  to  be  perceived  as 
similar.  These  typically  involve  a  theory  of  the  mechanism  involved  in 
similarity  relations.  The  identical  elements  measures  imply  that  stimuli 
are  similar  because  they  share  similar  sensory  components.  These  compon¬ 
ents  are  either  the  letters  that  make  up  the  visual  representation  of  the 
verba],  stimulus  or  the  sounds  of  the  vocal  responses  the  stimuli  habitually 
elicit.  These  measures  assume  that  the  similarity  of  the  stimuli  is  de¬ 
termined  by  their  sensory  properties.  A] though  the  semantic  differential 
and  the  associative  overlap  measures  assume  that  similarity ■ relations  are 
a  function  of  the  covert  responses  (meaning)  the  organism  has  learned  to 
the  stimuli,  they  have  been  classed  with  the  physical  measures  here  because 
they  specify  theoretical  mechanisms  that  determine  the  similarity  of  stim¬ 
uli.  In  the  case  of  the  semantic  differential  the  meaning  of  the  verbal 
stimulus  is  the  representational  mediation  process  it  elicits  which  is  part 
of  the  total  behavior  originally  provoked  by  the  object  for  which  the  verbal 
stimulus  is  a  sign.  The  associative  overlap  measures  assume  that  a  given 
verbal  stimulus  elicits  covert  verbal  responses  in  addition  to  the  one  that 
is  isomorphic  to  the  pronunciation  of  the  given  stimulus. 

Another  type  of  similarity  measure — the  consequent  measures — 


assumes  that  stimuli  perceived  as  similar  will  affect  responses  learned  to 
these  stimuli  in  a  certain  way.  Any  phenomenon  that;  is  theoretically  related 
to  stimulus  similarity  can  be  used  to  construct  a  measure  of  this  class- 
stimulus  and  response  transfer,  stimulus  differentiation  in  first-list  learn¬ 
ing,  and  clustering  of  responses  in  free  recall  of  a  list  are  only  a  few  of 
the  measures  that  have  been  or  could  be  used. 


The  third  type  of  similarity  measure — direct  ratin0 — could  be  con¬ 
sidered  as  a  variety  of  consequent  similarity  measure.  lioweve.L,  diiect 
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ratings  have  certain  distinctive  features  that  would  seen;  to  warrant  put  tin,, 
them  in  a  category  of  their  own,  Whereas  the  consequent  similarity  measure: 
all  are  based  on  the  subject’s  behavior  toward  the  stimuli  in  a  learning  or 


perceptual  task,  the  direct  rating  measures  involve  a  psychophysical  judg¬ 
ment  by  the  subject  that  depends  on  his  use  of  the  predicate  "is  similar 
to."  This  type  of  similarity  measure  was  referred  to  as  an  empirical 
measure,  in  that  it  is  an  operational  definition  of  similarity  entailing  no 
assumptions  or  description  of  the  nature  of  similarity.  It  was  shown,  how¬ 
ever,  that  there  is  evidence  indicating,  these  direct  measures  of  similar¬ 
ity,  which  can  be  regarded  as  subjects’  psychophysical  assessments  of  the 
relations  between  their  mediating  responses  to  stimuli,  are  reliably  related 
with  many  of  the  antecedent  and  consequent  measures  of  similarity. 

It  would  seem  that  these  empirical  measures  of  similarity  and 


their  relations  with  other  types  of  similarity  measure  are  a  potentially 
fruitful  field  for  research  that  deserves  detailed  study.  Two  lines  of 
argument  support  this  contention.  The  first  is  that  it  would  seem  worth¬ 
while  to  determine  if  subjects’  impressions  of  similarity  relations  are 
reliably  related  to  the  antecedent  and  consequent  conditions  of  similarity. 
The  evidence  reviewed  earlier  suggests  this  is  the  case;  however,  a  valuable 
contribution  could  be  made  by  providing  a  more,  precise  technique  for  scaling 


perception  of  similarity. 

The  second  argument  concerns  the  present  lack  of  cohesiveness  in 
the  attacks  on  the  antecedent  and  consequent  aspects  of  similarity.  Each 
of  the  theories  specifying  how  antecedent  events  affect  the  perception  of 
similarity  might  be  valid;  similarity  might  be  caused  by  a  variety  of 
causes,  or  there  might  be  more  than  one  variety  of  similarity.  As  presently 
formulated,  however,  the  antecedent  measures  of  similarity  are  limited  in 
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the  range  of  material  that  they  are  relevant  to.  It  would  be  difficult  ior 
an  associative  commonality  measure  to  be  used  over  a  wide  range  of  stimulus 
meaningfulness,  since  by  definition  low-meaningful  stimuli  usually  elicit 
few  associative  responses.  Although  acoustic  similarity  has  been  shown  to 
have  reliable  effects  on  the  learning  and  retention  of  high-meaningful 
material  (Dallett,  1966),  experiments  comparing  physical  and  semantic  similar 
ity  seem  to  show  that  the  dominant  feature  determining  the  similarity  of 
high-meaningful  stimuli  is  their  meaning  (Razran,  1939;  Abbott,  1966),  which 
limits  the  usefulness  of  the  sensory  component  measures.  And  research 
(Osgood,  1962)  seems  to  indicate  that  the  semantic  differential  only  assesses 
one  limited  aspect  of  meaning.  Concerning  the  consequent  measures  of  simi¬ 
larity,  these  involve  complex  phenomena  the  measurement  of  which  pose  diffi¬ 
cult  problems  in  themselves.  It  is  entirely  possible,  for  instance,  that 
the  psychological  distances  between  a  set  of  stimuli  will  be  perceived 
differently  by  a  subject  when  he  is  attempting  to  discriminate  between  the 
stimuli  for  the  first  time  while  learning  a  list  than  when  he  is  attempting 
to  recall  the  same  stimuli  after  learning  the  list  to  some  criterion. 

It  is  the  thesis  of  this  paper  that  subjects'  direct  ratings  can 
provide  measures  of  similarity  that  relate  to  antecedent  and  consequent 
conditions  of  similarity.  It  is  also  believed  that  scaling  techniques  are 
best  suited  to  obtain  estimates  of  similarity  relations  unbiased  by  other 
effects,  and  that  in  addition  they  provide  the  most  convenient  means  of 
obtaining  data  on  similarity  relations.  If  this  presumption  is  not  immedi¬ 
ately  evident  from  previous  data  on  similarity  rating,  it  might  be  explained 
in  two  ways. 

First,  previous  similarity  measures  have  almost  invariably  assumed 
some  basis  for  similarity.  These' assumptions  were  often  explicit,  as  in 
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the  semantic  differential  and  associative  overlap  measures.  But  upon 
examination  of  the  instructions  used  in  many  of  the  direct  rating  studies 
it  is  obvious  that  an  implicit  assumption  has  been  involved.  Phrases  such 

as  "similarity  of  meaning"  or  "referring  to  the  same  things  or  actions" 

\ 

typically  are  used  to  clarify  to  the  subject  the  nature  of  the  experimental 

task.  These  instructions  would  also  probably  sensitize  or  set  the  subject 

% 

toward  certain  attributes  of  the  stimuli  and  produce  ratings  which  might  be 

It ' 

limited  in  usefulness. 

The  second  point  is  that  all  previous  studies  of  direct  similar¬ 
ity  ratings  have  used  the  pooled  data  from  several  subjects.  Often  this  was 
done  of  necessity  as  the  reliability  of  the  measures  used  was  too  poor  for 
use  on  individual  subjects.  What  has  not  apparently  always  been  realized 
is  that  this  procedure  is  based  on  the  assumption  that  the  commonality  of 
similarity  structures  between  individuals  is  sufficiently  great  that  "... 
different  subjects  may  be  regarded  merely  as  independent  and  random  replic¬ 
ations  of  each  other  ..."  (Coombs,  1964,  p.  435).  Coombs  points  out 
that  this  is  not  the  case  if,  for  instance,  the  stimuli  are  color  patches 
and  some  subjects  are  color  blind.  It  is  interesting  to  note  that  Helm 
and  Tucker  (1962)  found  individuals  with  normal  color  vision  differed  in 
the  three-dimensional  psychological  structures  derived  from  their  color¬ 
rating  data.  When  it  is  considered  that  meaning,  one  of  the  most  important 
attributes  of  verbal  stimuli,  is  dependent  on  learned  responses,  then  it 
would  seem  reasonable  to  attempt  to  verify  the  assumption  that  meaning 
structure  is  stable  from  individual  to  individual.  The  large  difference 
found  by  Garskof  and  Houston  (1963)  between  the  correlations  of  measures 
of  associative  overlap  based  on  group  and  individual  data  with  similarity 

ratings  and  the  difference  between  the  correlation  between  rated  similarity 

•  \ 
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and  similarity  criteria  production  found  by  Attneave  (1951)  with  his  intra¬ 
group  procedure  and  Flavell  and  Johnson’s  (1961)  intergroup  procedure  also 
suggests  that  better  results  might  be  possible  using  data  from  individual 
sub j ects . 


In  conclusion,  this  study  will  explore  the  hypothesis  that  in¬ 
dividual  variations  in  similarity  structures  exist  for  verbal  stimuli,  and 
that  it  is  possible  to  assess  individual  similarity  structures  and  the 
theoretical  relations  between  similarity  and  performance  in  verbal  learning. 

A  proposed  method  for  accomplishing  this  aim  will  be  outlined  in  the 
following  section. 

Individual  measurement  of  Similarity  Relations 

The  main  obstacle  to  attempting  to  obtain  direct  ratings  of  the 
similarity  of  verba]  stimuli  for  individual  subjects  appears  to  have  been 
the  amount  of  labor  required  of  the  subject  and  the  experimenter  to  get 
reliable  data  by  previous  scaling  methods.  Osgood  et  al  (1957),  for  example, 
in  commenting  on  a  study  by  Rowan  (1954)  comparing  measures  of  similarity 
obtained  by  the  semantic  differential  and  the  method  of  "triads"  (presenting 
the  subject  writh  three  stimuli  and  asking  him  to  indicate  which  of  the  three 
possible  pairs  of  stimuli  show  the  greatest  or  least  degree  of  a  given  re¬ 
lationship)  say  "The  method  of  triads  and  other  methods  of  the  same  type 
are  excessively  laborious  and  time-consuming  ,  .  ."  (p.  145).  Although 
they  acknowledge  that  the  method  of  triads  involves  fewer  assumptions  and 
a  more  spontaneous  basis  of  judgment,  Osgood  et  al  conclude  that  "the  ’freer’ 
but  more  laborious  method  of  triads  should  be  used  to  test  the  validity  of 
more  ’restrictive’  but  simpler  methods  like  the  semantic  differential" 

(p.  146),  rather  than  as  a  practical  scaling  method  for  experimental  work. 
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If  only  scaling  operations  involving  many  repetitions  of  the  same 
similarity  ratings  in  order  to  obtain  stable  estimates  of  stimulus  distances 
(e.g.j  Torgerson,  1952)  are  considered,  then  Osgood's  estimate,  of  the  method 
of  triads  as  impractical  is  probably  correct.  However,  Coombs  (1964)  has 
suggested  a  method  of  analyzing  data  from  the  method  of  triads  whereby  a 
single  estimate  of  each  triad  can  produce  an  ordinal  scale  of  interstimulus 
distances.  Coombs  points  out  that  the  assumption  of  transitivity  of  simi¬ 
larity  relations  provides  information  which  enables  the  experimenter  to 
demand  less  of  the  subject  in  the  rating  task.  For  example,  if  the  similar¬ 
ity  of  stimuli  A  and  B  is  rated  as  greater  than  the  similarity  of  stimuli  B 
and  C,  and  if  B  and  C  are  judged  more  similar  than  C  and  D,  then  by  the 
assumption  of  transitivity  A  and  B  should  be  more  similar  than  C  and  D. 
Coombs  discusses  scaling  methods  which  provide  enough  redundancy  to  test 
the  assumption  of  transitivity,  and  yet  enable  the  experimenter  to  arrange 
the  experimental  task  so  as  to  obtain  a  large  amount  of  information  about 
stimulus  similarity  from  a  moderate  amount  of  effort  on  the  part  of  the 
subj  ect , 


Only  two  studies  seem  to  have  utilized  Coombs'  suggestions  for 
the  scaling  of  similarity  of  verbal  stimuli.  Vastenhouw  (1962)  had  subjects 
rate  for  similarity  17  names  of  personality  traits.  He  found  that  subjects 
were  consistent  in  their  ratings  and  that  information  from  a  limited  number 
of  ratings  could  be  used  to  predict  the  rated  similarity  of  combinations 
not  previously  judged.  He  also  reported  that  subjects'  predictions  of  the 
probability  of  co-occurrence  of  personality  traits  in  persons  was  related 
to  their  similarity  ratings  of  the  traits.  Dember's  (1957)  subjects  scaled 
the  words  "always",  "often",  "rarely",  "seldom",  and  "never".  The  response 
latency  of  judgment  was  proportional  to  the  semantic  distance  between 
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stimuli.  However,  the  stimuli  had  been  chosen  so  as  to  be  described  by  but 
a  single  dimension  of  meaning.  Thus,  although  Coombs'  procedure  seems 
promising  for  the  scaling  of  verbal  stimuli,  no  extensive  test  appears  to 
have  been  made  of  the  technique  for  this  purpose. 


Effects  of  Familiarization 


Presumably,  testing  an  idiographic  approach  to  similarity  scaling 
would  necessitate  a  within-subj ects  experimental  design.  It  can  be  seen 
that  the  scaling  procedure  and  the  validation  operation  must  involve  the 
same  subjects  if  it  is  assumed  that  each  subject  has  his  own  idiosyncratic 
structure  of  similarity  relationships.  This  poses  the  problem  of  whether 
being  exposed  to  the  stimuli  in  the  scaling  procedure  before  the  validation 
test  would  affect  performance  on  the  validation  operation,  or  whether  the 
converse  would  happen  if  similarity  scaling  were  done  after  the  validation 
procedure.  If  a  learning  task  is  involved  in  the  validation  procedure  the 
process  of  familiarization  would  be  involved. 

Familiarization  refers  to  a  group  of  experimental  procedures  in 
which  a  subject  receives  a  controlled  amount  of  exposure  to  material  prior 


to  attempting  a  learning  task  involving  this  material.  It  has  been  hypothe¬ 
sized  (Gannon  and  Noble,  1961)  that  amount  of  previous  experience  with  a 
word  determines  its  meaningfulness;  hence  experimental  familiarization  of 


ma 


terial  before  including  it  in  a  learning  task  should  increase  its  mean¬ 


ingfulness  and  the  ease  of  learning  the  material  through  acquired  distinc¬ 
tiveness.  There  has  been  controversy  over  the  nature  of  the  theoretical 
mechanism  that  supposedly’  mediates  tnis  acquired  distinctiveness.  In  tiicir 
review  of  work  done  on  discrimination  learning,  lighc-  and  ]  i^lic  (196o) 


conclude  that  t  tic  re  are  two  mam  schools  01  tbougnt  concerning  the  cause  of 
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acquisition  of  distinctiveness  of  cues.  The  differentiation  hypothesis  has 
it  that  with  practice  the  organism  becomes  more  sensitive  to  distinguishing 
cues  within  the  stimuli.  The  mediation  hypothesis  assumes  that  the  organism 
supplements  the  originally  similar  cues  in  the  stimuli  with  distinguishing 
covert  responses,  thus  rendering  the  stimuli  more  distinct ive. 

However,  a  further  aspect  of  the  problem  complicates  the  matter 
even  more.  Operations  which  have  frequently  been  used  to  specify  familiar¬ 
ization  procedures — repetition  or  inspection  of  the  experimental  material — 
have  also  been  used  to  produce  the  phenomenon  of  semantic  satiation,  or 
decrease  in  meaning  of  verbal  stimuli  (e.g.,  Lambert  and  Jakobovits,  1960). 

To  confuse  the  issue  even  further  conflicting  data  have  been  reported  both 
from  experiments  designed  to  investigate  familiarization  and  from  experiments 
designed  to  investigate  semantic  satiation — both  types  of  study  have  produced 
increases  and  decreases  in  meaningfulness.  Amster  (1964)  has  reviewed  the 
evidence  concerning  these  conflicting  effects  in  satiation  studies  and  con¬ 
cludes  that  many  of  the  data  can  be  accounted  for  by  variation  in  experi¬ 
mental  procedures  and  materials,  Goss  and  Nodine’s  (1965)  review  of  the 
effects  of  stimulus  familiarization  on  subsequent  learning  suggests  that 
the  equally  contradictory  results  found  in  this  field  may  have  resulted 
from  the  welter  of  familiarization  procedures  as  well.  It  would  appear  that 
closer  control  of  the  stimulus  exposure  operations  will  be  necessary  to 
determine  how  the  mechanisms  of  familiarization  and  satiation  work. 

It  can  be  seen  that  a  stimulus  rating  procedure  can  be  considered 
as  a  stimulus  exposure  operation,  and  that  having  a  subject  rate  a  group 
of  stimuli  before  attempting  a  learning  task  could  possibly  affect  perform¬ 
ance  on  this  task.  Conversely,  it  is  conceivable  that  learning  responses 
to  stimuli  in  a  paired-associates  list  might  affect  the  mediating  responses 
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to  the  stimuli  and  hence  the  ratings  of  similarity  relations  between  the 
stimuli  (cf.  Staats  and  Staats,  1957).  Alternatively,  it  might  be  expected 
that  learning  to  associate  distinctive  responses  to  a  group  of  stimuli 
could  cause,  the  subject  to  attend  more  to  the  distinguishing  aspects  of  the 
stimuli,  and  cause  a  change  in  similarity  ratings.  In  any  case  it  would  be 
of  interest  to  examine  subjects’  performance  on  associating  responses  to 
stimuli  that  had  previously  been  rated  for  similarity,  and  also  to  test  to 
find  if  using  a  group  of  stimuli  in  a  paired-associates  learning  task  changes 
the  perceived  similarity  relations  among  the  stimuli. 


Statement  of  the  Problem 


No  method  has  previously  been  tested  for  estimating  the  rated 
similarity  of  verbal  stimuli  for  an  individual  subject.  As  the  properties 
of  verbal  material  determining  their  perceived  similarity  would  seem  to  be 
caused  by  responses  elicited  by  them  in  the  organism,  and  as  the  nature  of 
the  response  to  a  given  stimulus  vrould  depend  on  the  organism’s  previous 
experience,  it  follows  that  an  idiographic  measurement  technique  would  be 
necessary  to  determine  the  similarity  relations  between  stimuli  which 
different  individuals  might  have  experienced  in  different  ways.  This  study 
will  attempt  to  validate  a  technique  for  measuring  the  similarity  of  verbal 
materia],  over  a  wide  range  of  meaningful  ness  for  individual  subjects.  This 
will  be  accomplished  by  determining  both  the.  reliability  of  the  proposed 
procedures  and  the  relationship  between  rated  similarity  of  stimuli  and 
stimulus  confusion  errors  in  a  paired— associates  learning  task. 

A  secondary  purpose  of  this  investigation  will  be  to  study  the 
effects  of  stimulus  familiarization  (produced  by  a  similarity-rating  oper¬ 
ation)  on  acquisition  in  subsequent  paired-associates  learning. 


A  related 
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secondary  purpose  will  be  to  study  the  effects  of  learning  distinguishing 
responses  to  stimuli  upon  the  perceived  similarity  of  those  stimuli. 
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CHAPTER  IT. 


Method 


The  general  plan  of  the  experiment  was  that  subjects  first  rated 
an  experimental  and  a  control  set  of  stimuli  for  similarity,  then  attempted 
to  learn  responses  in  a  paired-associate  task  to  half  of  these  stimuli  plus 
an  equal  number  of  new  stimuli  for  a  set  number  of  trials.  On  the  last 
trial  of  the  learning  task  a  recognition  test  was  substituted  for  the  recall 
test.  Finally,  the  subjects  repeated  the  similarity  ratings  for  all  the 
stimuli  that  had  been  used  in  the  paired-associate  test,  but  not  rated 
previous  to  the  learning  task. 

Material 


Two  paired  associate  lists  (Table  I)  were  constructed,  each  list 


containing  12.  pairs.  The  same  12  two-syllable  adjectives  were  used  as 
responses  in  both  lists.  The  response  terras  were  selected  so  that  no  two 
began  with  the  same  letter,  no  words  had  any  obvious  semantic  relation  to 
each  other,  and  all  had  an  AA  rating  in  Thorndike  and  Forge's  (1944)'  list 
(signifying  that  they  were  among  the  1,000  most  frequently  occurring  words 
in  written  English  and  so  are  presumably  highly  meaningful  in  the  commonly 
used  sense  of  the  word).  These  measures  were  taken  to  minimize  the  effect 
of  response  similarity  during  the  learning  of  the  lists.  The  stimulus  items 
were  selected  according  to  two  main  principles: 

1.  Each  list  contained  two  equal  groups  of  homogeneous  stimuli  representing 


two  distinct  and  non-overlapping 


levels  of  meaningfulness. 


2.  Within  each  meaningfulness  level  there  was  a  broad  range  of  inter¬ 
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stimulus  similarity.  The  low-meaningful  stimuli  were  consonant-consonant- 
consonant  (CCC)  nonsense  syllables  from  Witrncr’s  (1935)  list  which  corre¬ 
sponded  to  no  known  English  words  or  abbreviations.  All  syllables  had  Uitmer 
association  scale  values  of  17%  or  less.  The  mean  association  value  for  the 
OCCs  in  List  1  was  10.0%,  and  for  those  in  List  2,  12.2%, 


TABLE  I 

STIMULUS  AND  RESPONSE 
TERMS  OF  PAIRED- ASSOCIATE  LISTS 


The 

Thorndike-  and 


S  t imu 1 i 

List  1 

List  2 

DJZ 

HOLY 

WZQ 

NICE 

CGP 

LAZY 

ZMJ 

DARK 

MZJ 

RICH 

ZXJ 

CALM 

HARD 

CXK 

FULL 

XGK 

MILD 

ZKG 

SOFT 

WGP 

SLOW 

MHB 

WISE 

GQK 

high-meaningful  stimuli  were 


Responses 
Lists  1  &  2 
EARLY 
ABOVE 
BROKEN 
HEAVY 
WEARY 
FAMOUS 
READY 
SIMPLE 
OFTEN 
PUBLIC 

N 

TENDER 

COMMON 

four-letter  adjectives  taken  from 


Lorge* s  (1944)  list.  Two  (both  from  List  1)  were  estimated  to 


•• 
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be  among  the  approximately  500  most  frequently  occurring  words  in  written 
English,  five  (two  from  List  1,  three  from  List  2)  as  among  the  first  1,000, 
three  (one  from  List  1,  two  from  List  2)  from  the  first  2,000,  one  (List 
1)  from  the  first  approximately  2,800,  and  one  (List  2)  from  approximately 
the  first  3,400  most  commonly  occurring  words. 

From  the  above  specifications  it  would  seem  reasonable  to  assume 
that  the  stimulus  material  represents  two  discrete  groups  of  items  on  the 
dimension  of  meaningfulness,  while  the  homogeneity  of  meaningfulness  within 
the  groups  is  quite  high.  The  reasonableness  of  the  assumption  of  a  large 
inter-group  difference  would  seem  assured  by  the  fact  that  half  the  items 
are  virtually  unpronounceable  combinations  of  letters  that  have  been  selected 
for  their  low  probability  of  eliciting  any  associations  from  subjects,  while 
the  other  half  are  extremely  common  English  words.  Technically,  the  assump¬ 
tion  of  high  intra-group  homogeneity  is  not  as  obviously  reasonable,  for 
the  meaningfulness  index  of  the  CCCs  ranges  from  0  -  17%,  while  the  frequency 
of  occurrence  of  the  highly  meaningful  material  represents  extremes  of  from 
once  per  500  words  to  once  per  3,400  words.  However,  an  examination  of  the 
extremities  among  these  stimuli  (ZKJ  and  V.'GT  versus  ZXJ  in  the  CCCs  and  FULL 
and  HARD  versus  LAZY  in  the  meaningful  words)  lends  little  intuitive  support 
to  the  argument  that  appreciable  intra-group  differences  might  exist  in  the 
functional  meaningfulness  of  these  stimuli  for  highly  literate  subjects, 

relative  to  the  inter-group  difference. 

Choosing  the  stimuli  so  as  to  ensure  a  wide  range  of  inter-stimulus 
similarities  on  dimensions  other  than  meaningfulness  was  done  in  two.  differ¬ 
ent  ways.  As  it  was  assumed  that  the  dominant  feature  determining  inter¬ 
stimulus  similarity  for  the  CCCs  was  formal  similarity,  the  nonsense  syllable 


stimuli  in  the  two  lists  wore  selected  to  meet  t.ie  following  critei iu . 
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(1)  The  initial  and  final  leters  in  two  stimuli  were  the  same. 

(2)  A  third  stimulus  contained  the  same  three  letters  as  one  of  the 
stimuli  in  (1),  but  with  the  first  two  letters  transposed. 

(3)  A  fourth  stimulus  contained  the  two  letters  common  to  the  first  three 
stimuli,  but  in  different  positions. 

(4)  A  fifth  stimulus  had  the  same  middle  letter  as  the  stimulus  in  (3) . 

(3)  The  sixth  stimulus  had  no  letters  in  common  with  other  stimuli. 

(6)  All  letters  apart  from  those  specified  in  (1)  to  (5)  were  different. 

In  summary,  in  six  stimuli  one  letter  was  repeated  five  times, 
one  letter  was  repeated  four  times,  one  letter  was  repeated  twice,  and 
seven  letters  were  used  only  once.  The  repeated  letters  occupied  a  variety 
of  positions  and  combinations  of  positions  in  the  stimuli. 

t 

It  was  presumed  that  the  dominant  feature  to  which  subjects  react 
in  meaningful  inter-stimulus  similarity  is  their  semantic  properties.  The 
meaningful  stimuli  were  selected  from  those  words  in  the  Semantic  Atlas 
published  by  Jenkins  (1960)  describing  the  scores  on  the  three  principal 
factors  of  the  semantic  space  found  by  Osgood  ejt  al,  (1957)  that  also 
appeared  in  Thorndike  and  Lorge's  (1944)  list  of  words  with  high  frequency 
of  occurrence  in  written  English.  Assuming  that  these  three  factors  divide 
the  semantic  space  into  eight  octants,  six  stimuli  were  chosen  for  each 
list  that  met  the  following  criteria: 

(1)  Two  stimuli  were  from  the  same  octant. 

(2)  Three  stimuli  were  from  three  other  octants. 

(3)  One  stimulus  was  from  near  the  point  of  origin  of  the  three  principal 
factors,  i.e.,  was  relatively  "neutral"  by  the  criteria  of  the  Semantic 


Dif  f erential , 


, 
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It  is  hoped  that  this  procedure  would  produce,  a  wide  range  of 
inter-st irnulus  ranges,  and  that  the  tw:o  lists  would  be  as  comparable  as 
possible  as  to  inter-stimulus  differences.  However,  for  the  high-meaning¬ 
ful  stimuli  it  is  realized  that  the  second  objective  could  be  reached 
only  in  the  most  approximate  of  fashions,  for  a  variety  of  reasons, 

(1)  Although  the  criteria  for  selecting  octants  were  the  same  for  both 
lists,  a  different  pattern  of  octants  were  sampled  for  the  two  lists.  Only 
one  stimulus  from  List  2  came  from  the  octant  that  yielded  two  stimuli  for 
List  1;  there  was  no  stimulus  on  List  1  from  the  octant  where  the  two  most 
similar  stimuli  on  List  2  originated. 

(2)  Little  faith  was  put  in  the  reliability  of  the.  inter-stimulus  differ¬ 
ences  determined  from  the  Semantic  Atlas  for  the  reasons  discussed  earlier 
in  this  thesis. 

(3)  Although  one  stimulus  on  each  list  was  supposedly  "neutral"  in  that 
its  three  principal  axis  scores  were  all  close  to  the  point  of  origin  in 
the  semantic  space,  it  is  probably  more  likely  that  these  stimuli,  as 
Jenkins  (I960)  points  out,  draw  inconsistent  and  therefore  cancelling  responses 
from  subjects.  This  conclusion  is  supported  by  the  fact  that  although  the 

two  "neutral"  words  involved,  FULL  and  DARK,  are  approximately  contiguous  in 
the  semantic  space  with  the  concept  GOJLY  (a  meaningless  "paralog  ),  tiiey 
are  also  quite  close  to  CONSCIENTOUS  OBJECTOR  and  SOCIAL! Si  ,  concepts  that 
would  presumably  be  reasonably  meaningful  for  the  American  college  students 
who  were  the  standardization  sample  for  the  Semantic  Atlas.  It  is  presumed, 
then,  that  FULL  and  DARK  will  not  be  neutral  or  central  points,  but  will 
show  extremely  large  inter— subject  variations  as  points  in  semantic  space. 

It  might  be  asked  why  the  Semantic  Atlas  was  used  to  select  these 
stimuli,  if  it  is  known  that  its  reliability  is  so  poor.  The  problem  nei. e, 
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however,  is  not  to  get  an  accurate  rating  of  the  in ter- stimulus  distances 

but  to  ensure  that  the  sample  of  stimuli  selected  represents  an  adequate 

range  to  test  the  proposed  procedure.  Although  the  Semantic  Differential 

would  seem  to  be  an  imperfect  instrument  for  measuring  inter-stimulus 

distances,  it  appears  to  be  the  best  means  available  of  assuring  a  wide 

range  of  inter-stimulus  distances  among  highly  meaningful  verbal  stimuli. 

* 

It  must  be  realized,  of  course,  that  less  is  known  about  the  differences 
between  the  high-meaningful  stimuli  than  about  the  distances  between  the 
low '-meaningful  stimuli,  and  so  there  is  less  assurance  that  the  high- 
meaningful  stimuli  on  List  1  will  be  roughly  comparable  to  the  high- 
meaningful  List  2  stimuli. 

The  12  response  terms  were  paired  at  random  with  the  12  stimulus 
items  of  List  1  and  the  12  stimulus  items  of  List  2,  the  only  restriction 
being  that  the  responses  which  were  paired  with  the  high-meaningful  material 
on  List  1  were  paired  with  the  low-meaningful  material  on  List  2,  and  vice 
versa.  Three  presentation  orders  of  the  12  stimulus-response  pairings  and 
three  recall  test  orders  of  the  12  stimuli  alone  were  arranged  for  the  two 
lists.  The  orders  were  randomized,  with  the  restriction  that  the  stimuli 
in  the  last  two  stimulus-response  pairs  in  a  given  presentation  could  not 
occur  in  the  first  two  stimuli  in  the  immediately  ensuing  recall  test.  The 
same  arrangements  of  stimulus-response  pairings  and  stimuli,  were  used  in 
Lists  1  and  2,  equivalence  between  items  on  the  two  lists  being  decided  by 
the  common  response. 

The  recognition  test  for  any  given  subject  consisted  of  24  stim¬ 
ulus-response  pairings,  with  each  stimulus  and  response  in  the  paired- 
associate  list  occurring  twice  in  the  test.  A  number  of  forms  of  the 
recognition  test  v.’ere  used*  To  construct  t no  different  tests  (table  II)  , 


PAIRINGS  OF  STIMULI  AND  RESPONSES  USED  ON  RECOGNITION  TEST 
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four  six-by-six  matrices  of  all  the  possible  combinations  of  stimuli  and 
responses  in  the  four  groups  of  six  stimulus-response  pairs  were  drawn  up, 
each  comprising  six  correct  and  30  incorrect  possible  stimulus-response 
pairings.  From  each  of  these  matrices  two  basic  recognition  tests  were 
constructed,  each  test  being  divided  into  two  halves.  (Two  forms  of  the 
test  were  used  to  ensure  that  while  a  broad  sample  of  mispairings  was 
tested,  the  individual  subject  was  not  overburdened.  The  test  was  divided 
into  halves  to  separate  the  two  occurrences  of  each  stimulus  and  response 
as  widely  as  possible,  and  to  allow  for  counterbalancing  to  control  sequence 
effects . ) 

Latin  squares  were  used  to  impose  the  following  conditions  upon 
these  tests: 

(1)  Each  half-test  contained  two  correctly  paired  and  four  mismatched 
stimuli  and  responses. 

(2)  Each  stimulus  and  each  response  was  used  once  and  only  once  in  each 
half-test . 

(3)  Stimuli  and  responses  that  had  occurred  as  correct  pairings  in  a  given 
half-test  w7ere  used  in  mismatched  pairs  in  the  complementary  half— test. 

(4)  No  particular  stimulus-response  mismatching  w'as  repeated  on  both  halves 

of  a  test. 

(5)  The  two  forms  of  the  recognition  test  were  made  as  independent  as 
possible  of  each  other  in  the  following  way: 

(a)  Only  two  of  the  six  possible  correct  stimulus-response  pairings  were 
repeated.  These  tw'o  repetitions  w7ere  unavoidable,  of  course,  Ho  there  were 
eight  correct  pairings  used  in  all. 

(b)  No  stimulus-response  mismatching  was  repeated  on  both  tests. 


(c)  From  the  correct  stimulus-response  pairs  S  -  R  and  S  -  R  two  mis- 

i  i  j  j 

matched  pairs,  S_.  -  and  S.  -  R_^,  can  be  constructed.  These  two  mispair- 
ings  might  be  considered  as  "mirror  images"  of  each  other.  Only  three  of 
the  mismatched  pairs  presented  to  a  given  subject  were  mirror  images  of 
other  mispairings.  Thus,  of  the  16  stimulus-response  mispai rings  involved 
in  a  given  six-by-six  matrix,  13  may  be  regarded  as  orthogonal  to  each  other, 
and  three  to  represent  mirror  images  of  other  mismatches.  This  proportion 
of  mirror  image  to  orthogonal  mismatches  represents  the  minimum  possible 
number  of  overlaps  of  this  type  under  these  conditions.  It  was  considered 
desirable  to  sample  the  widest  possible  range  of  independent  stimulus- 
response  mispairings  for  the  recognition  tests.  For  all  16  mismatches  to 
be  independent  would  necessitate  using  a  set  of  orthogonal  Latin  squares. 
Unfortunately,  these  do  not  exist  for  a  six-by-six  matrix  (Liner,  1962), 

In  summary,  two  forms  of  the  recognition  test  were  constructed 
for  each  of  the  four  groups  of  high-  and  low-meaningf ul  stimuli  from  Lists 


1  and  2,  each  test  consisting  of  two  halves.  In  a  given  half  all  stimulus 
and  response  items  occurred  only  once,  in  two  correctly  matched  and  four 
mismatched  pairs.  The  pairings  in  one  half  of  the  test  were  independent  of 
the  pairings  of  the  items  in  the  other  half  of  the.  test.  Thus  each  test 
contained  four  correct  pairings  and  eight  different  mismatches  of  stimuli 
and  responses.  No  stimulus-response  pairings  were  repeated  in  the  two 
forms  of  the  test,  although  three  pairs  of  mismatches  occuried  that  were 

derived  from  the  same  pairs  of  correct  pairings. 

The  final  form  of  the  recognition  test,  as  used  in  the  experiment, 

was  constructed  by  arranging  in  random,  order  the  items  from  the  firot  halves 
of  the  high-  and  low-meaningful  recognition  material,  and  then  repeating 
this  process  for  the  second  halves.  This  procedure  was  done  sepaiately  for 


Lists  1  and  2. 


A  2 

The  end  product  was  four  recognition  tests,  two  for  List  1 
and  two  for  List  2,  Each  test  was  in  two  parts,  each  ^art  containing  12 
stimulus-response  pairings  with  both  high-  and  low-meaningful  stimulus 
terms  in  random  order. 

A  practice  list  of  eight  nonsense  shapes  paired  with  eight  one- 
digit-number  responses  was  prepared.  Three  orders  of  the  stimulus-response 
pairings  for  presentation,  two  orders  of  the  stimuli  only  for  recall  testing, 
and  one  recognition  test  containing  four  correct  pairs  and  four  mismatches 
were  used.  This  practice  list  was  used  to  ensure  that  the  subjects  under¬ 
stood  the  instructions  concerning  the  experimental  procedure. 

Four  forms  for  rating  the  similarity  of  the  stimulus  items  were 
composed,  corresponding  to  the  four  combinations  of  high-  and  low-meaningful 
items  from  List  1  and  List  2  (for  an  example,  see  Appendix  A),  Each  form 
contained  the  20  possible  combinations  of  the  six  stimuli  taken  three  at  a 
time.  The  sequence  of  the  triads  was  randomized  separately  for  each  form. 
Each  triad  of  stimuli  was  arranged  in  a  triangle,  with  one  term  representing 
the  apex  and  the  other  two  terms  forming  the  base.  The  designation  of  the 
three  terms  into  apex,  left  base  and  right  base  positions  was  randomized 
in  each  triad  with  the  restriction  that  no  stimulus  term  was  repeated  in  a 
given  position  less  than  three  times  or  more  than  four  times. 


Design 


Each  of  64  subjects  went  through  three  basic  steps: 


(a)  They  rated  for  similarity  two  sets  of  six  stimuli. 

% 

Cb)  They  attempted  to  learn  12  paired-associates  for  a  prearranged  number 
of  trials  by  the  study-test  method  (alternating  presentations  of  the  12 
pairs  with  instructions  to  learn  the  pairings,  and  presentations  of  the  12 
stimuli  only,  with  instructions  to  recall  the  missing  responses).  Du  the 
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last  trial  a  recognition  test  was  substituted  for  the  recall  test. 

(c)  They  rated  for  similarity  the  two  sets  of  stimuli  previously  rated, 
plus  a  third  set  of  stimuli. 

Differences  in  this  procedure  were  as  follows: 

(1)  Half  of  the  64  subjects  rated  for  similarity  the  low-meaningful  stimuli 
of  both  List  1  and  List  2,  the  other  half  rated  the  high-meaningful  stimuli 
of  Lists  1  and  2.  Half  of  the  subjects  did  the  List  1  ratings  first  and 
then  rated  the  List  2  material  while  the  other  half  rated  the  List  2  material 
first  and  List  1  second. 

(2)  Four  groups  of  16  subjects  each  attempted  to  learn  the  12  word-pairs 
for  either  2,  4,  8,  or  16  trials,  respectively.  Half  of  the  subjects  in 
each  group  practiced  on  List  1,  the  other  half  on  List  2. 

(3)  All  subjects  repeated  the  two  similarity  ratings  they  had  done  in  (1), 
except  that  the  order  of  the  ratings  was  reversed.  All  subjects  also  rated 
for  similarity  the  set  of  six  stimuli  from  the  list  that  they  had  attempted 
to  learn  responses  to  in  (2),  but  which  had  not  been  rated  previously,  e.g., 
a  subject  who  had  first  rated  the  high-meaningful  stimuli  from  both  Lists  1 
and  2,  and  then  hod  attempted  to  learn  the  stimulus-response  pairings  of 
List  1  would  subsequently  repeat  his  ratings  on  the  high-meaningful  stimuli 
from  Lists  1  and  2,  and  in  addition  vTould  rate  the  low-meaningful  stimuli 
from  List  1.  Half  the  subjects  repeated  their  previous  ratings  first  before 
rating  the  new  material,  while  the  other  half  did  the  new  rating  before  the 
old  ones . 


Table  III  summarizes  the  assignment  of  the  combinations  of  the 
variables  of  list,  meaningfulness,  order  of  rating  of  the  sets  of  stimuli, 
and  the  list  used  in  the  learning  task  for  16  subjects.  This  matrix  of 
experimental  conditions  assigned  to  subjects  was  repeated  four  times,  once 
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TABLE  III 


ASSIGNMENT  OF  SUBJECTS  TO 
LIST  MEMBERSHIP  OF  STIMULI 


COMBINATIONS  OF  MEANINGFULNESS  AND 
IN  SIMILARITY  RATINGS  AND  OF  LI  SI- 


USED  IN  LEARNING  TASK 


MEANING 

FULNESS  & 

LIST  USED 

MEAN  I 

NG FULNESS  & 

LIST  OF 

STIMULI 

IN  LEARNING 

LIST 

OF  STI 

MU  LI 

SUBJECT 

IN  PRELEARNING 

TASK  AND 

IN  PO 

STLEARNING 

RATINGS 

RECOGNITION 

RATINGS 

ORDER 

RATED 

TEST 

ORDER  RATED 

1st 

2nd 

1st 

2nd 

3rd 

1 

HI 

11.2 

1 

H2 

HI 

Ll 

2 

HI 

K2 

1 

Ll 

112 

HI 

3 

HI 

E2 

2 

H2 

HI 

L2 

4 

HI 

H2 

2 

L2 

112 

Ill 

5 

H2 

HI 

1 

111 

H2 

Ll 

6 

112 

111 

1 

Ll 

111 

H2 

7 

II 2 

HI 

2 

III 

H2 

L2 

8 

112 

HI 

2 

L2 

HI 

II 2 

9 

LI 

L2 

1 

L2 

Ll 

III  • 

10 

LI 

L2 

1 

HI 

L2 

Ll 

]  1 

LI 

L2 

2 

L2 

Ll 

112 

12 

LI 

L2 

2 

H2 

L2 

Ll 

13 

L2 

LI 

1 

Ll 

L2 

Ill 

14 

L 2 

Ll 

1 

HI 

Ll 

L2 

15 

L2 

LI 

.2 

Ll 

L2 

112 

16 

L2 

Ll 

2 

H2 

Ll 

L2 

for  each  group  of  subjects  corresponding  to  the  four  degrees  of  learning 
used  in  the  experiment. 

Apart  from  the  fact  that  the  two  ratings  repeated  after  the 
learning  task  always  occurred  adjacent  to  each  other  and  in  the  reverse 
order  that  they  were  assigned  for  the  initial  rating,  the  variables  described 
in  (1),  (2),  and  (3)  above  are  orthogonal  to  each  other.  As  this  constitutes 
a2x2x4x2x2  factorial  design,  containing  64  cells,  it  can  be  seen 
that  each  subject  represents  a  unique  combination  of  these  variables.  How¬ 
ever,  as  it  seems  highly  unlikely  that  there  would  be  a  significant  inter¬ 
action  between  certain  combinations  of  variables  such  as  the  order  in  which 
the  subjects  did  the  post-learning  similarity  ratings  and  the  list  which 
they  attempted  to  learn,  any  meaningful  analysis  of  variance  done  on  the 
data  in  this  design  would  have  at  least  four  subjects  per  cell. 

A  number  of  other  variables  were  controlled,  but  not  in  an  ortho¬ 


gonal  design. 

(1)  Within  each  of  eight  groups  of  eight  subjects  (each  group  being  defined 
by  the  eight  combinations  of  the  two  lists  used  in  learning  and  the  four 
numbers  of  trials  for  which  practice  continued),  the  two  forms  of  the  recog- 


n 


ition  test  were  used  equally  often.  (Two  exceptions  occurred  to  this  rule 


through  an  error  by  the  experimenter — in  two  groups,  one  recognition  test 
was  used  five  times,  and  the  other  three  times). 

(2)  Within  each  of  the  eight  list s-hy- trials  treatment  groups  the  two 
orders  of  administration  of  the  separate  halves  of  the  recognition  test 
occurred  equally  often.  (Three  exceptions  to  this  rule  were  caused  by 
experimenter's  error.  In  three  groups,  one  sequence  was  used  five  times. 
In  all  ,  four  subjects  were  affected  fcy  these  eri.ois  and  those  Mciitioneu  in 
(1)  above  as  two  errors  in  procedure  were  committed  in  tue  running 
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(3)  Except  for  the  previously  mentioned  errors,  all  four  possible  combin¬ 
ations  of  the  two  forms  of  the  recognition  test  and  the  two  orders  of  pre¬ 
sentation  of  the  halves  were  used  equally  often  and  randomly  assigned  to 
subjects  in  each  of  the  eight  treatment-combination  groups. 

For  a  number  of  reasons  it  is  felt  that  these  errors  in  procedure 
(which  were  discovered  only  after  the  conclusion  of  the  experiment.)  were 
sufficiently  minor  that  their  possible  effects  on  the  data  can  be  ignored. 
First,  while  the  variables  involved  (groups  of  specific  stimulus-response 
mispairings  and  sequence  effects)  could  quite  conceivably  affect  the  data 
of  the  experiment,  they  arc  of  little  theoretical  interest  in  this  study 
and  were  controlled  only  so  that  their  effects  were  not  confounded  with  those 
of  other  more  important  variables.  Second,  it  was  considered  that  the  dis¬ 
ruptive  effects  of  these  irregularities  would  be  minimal,  as  the  data  of 
only  four  subjects  out  of  64  were  affected,  and  the  smallest  group  that  con¬ 
tained  two  of  these  subjects  also  contained  1.4  others. 

The  particular  treatment  combinations  were  randomly  ordered  in 
advance,  and  assigned  consecutively  to  subjects  as  they  arrived  at  the  lab¬ 
oratory  . 


Subi ects 


The  subjects  were  64  University  of  Alberta  students  (33  males  and 
31  females)  enrolled  in  the  introductory  psychology  class  who  took  part  in 
the  experiment  to  fulfill  a  course  requirement.  The  majority  of  subjects 
had  previously  served  in  other  verbal  learning  experiments  using  a  variety 
of  material  and  procedures,  none  of  which  particularly  resembled  those  used 
in  the  present  study. 


Nine  other  subjects  had  originally  taken  part  in  the  experiment, 


. 
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but  were  replaced  because  of  equipment  breakdown,  experimenter  error,  or 
subsequent  discovery  that  they  had  not  followed  instructions  in  the  similar¬ 
ity  rating  task.  Their  data  were  replaced  by  testing  subsequent  subjects 
in  their  place.  One  subject  found  the  learning  task  upsetting  and  was 
permitted  to  leave  the  experiment. 

Apparatus 


The  stimuli  and  responses  were  presented  visually  to  each  subject 
with  two  "One-plane  Readout"  display  cells  manufactured  by  Industrial 
Electronic  Engineers  Inc.  The  subject  sat  in  a  cubicle  with  the  display 
cell  screens  mounted  flush  in  the  wall  at  eye  level  about  2  feet  away.  The 
letters  making  up  the  stimuli  and  responses  were  projected  as  white  against 
a  dark  background,  and  were  approximately  1  inch  in  height.  The  selection 
of  stimuli  and  responses  was  programmed  with  a  Western  Union  tape  reader 
and  a  bank  of  relays.  The  timing  was  controlled  by  a  synchronous  motor  and 
an  eccentric  cam. 

Procedure- 


Similarity  Ratings.  Before  the  initial  rating,  the  subject  was 


told  to  read  a  dittoed  sheet  describing  the  rating  procedure  (Appendix  B) . 

He  was  instructed  to  indicate  the  most  similar  and  least  similar  pairs  of 
stimuli  in  each  triad,  with  the  criterion  for  similarity  to  be  decided  by 
him.  He  was  then  given  his  two  initial  preassigned  similarity  rating  sheets 

For  the  post-learning  similarity  rating,  the  subject  was  merely 
reminded  to  follow  the  same  procedure  as  before  when  he  was  given  his  three 
rating  sheets.  Most  subjects  took  approximately  5  minutes  to  complete  the 
ratings  on  one  set  of  stimuli. 

Loan  '  Task  and  Recognition  Test.  After  finishing  the  initial 
similarity  ratings  the  subject  was  seated  in  fiont  of  the  display  cells 


. 
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while  the  instructions  (Appendix  C)  were  read  to  him  by  the  experimenter  from 
outside  the  cubicle.  The  nature  and  timing  of  the  study-test  method  was 
outlined  to  him,  and  he  was  instiructed  to  call  out  as  many  responses  as  he 
could  remember  on  test  trials.  He  also  received  instructions  concerning  how 
to  respond  on  the  recognition  test,  and  was  told  that  this  type  of  test 
would  be  substituted  for  a  recall  trial  at  some  point  during  the  learning 
task. 


After  any  questions  had  been  answered,  the  subject  was  told  that 
he  would  be  given  a  practice  run  on  the  procedure  using  pairs  of  shapes 
and  numbers.  lie  was  then  shown  two  presentations  of  the  eight  shape-number 
practice  pairs,  each  presentation  being  followed  by  a  test  presentation  of 
the  eight  stimuli  alone.  The  pairs  were  presented  for  2  seconds  each,  and 
the  stimuli  for  4  seconds  each.  There  was  a  6-second  interval  during  the 
transition  from  presentation  to  test,  and  from  test  to  presentation.  If 
the  subject  did  not  attempt  a  response  during  the  first  recall  test  trial, 
he  was  reminded  that  he  should  try  to  call  out  any  responses  that  he  could 
remember.  All.  subjects  attempted  a  number  of  responses  by  the  second  recall 


trial . 


A  third  presentation  of  the  pairs  was  made  6  seconds  after  the 


second  recall  test.  The  subject  was  then  given  reminder  instructions 
(Appendix  C)  on  the  recognition  test,  and  shown  four  of  the  correct  pairings 
and  four  raispairings  of  the  stimuli  and  responses  in  a  random  sequence.  He 
was  given  unlimited  time  to  respond  to  each  pair.  As  all  subjects  showed 
appreciable  learning  at  this  point  in  the  procedure,  it  was  possible  to 
detect  the  f ew  subjects  who  had  apparently  misunderstood  the  instructions 
about  the  probability  estimates  and  consistently  reversed  the  procedure  for 
reporting  their  estimates  of  the  probability  of  a  change  (i.e. ,  responded 


'zero  instead  of  "one  hundred"  when  they  were  certain  the  pair  had  been 
changed),  ihe.se  subjects  were  asked  to  repeat  the  instructions  and  were 
reminded  if  they  were  incorrect. 

The  subject  was  then  given  the  preassigned  number  of  presentation 
trials  on  the  preassigned  list.  Each  presentation  except  the  last  was 
followed  by  a  recall  test  trial.  On  the  presentation  trials,  the  12  pairs 
were  shown  for  2  seconds  each;  on  the  test  trials,  the  12  stimuli  were  shown 
for  4  seconds  each ,  All  responses  and  emissions  to  stimuli  were  recorded, 
including  corrections  by  the  subject  of  an  initial  response  or  responses. 

A  6  second-pause  was  interposed  between  presentation  and  test  trials. 

After  the  final  presentation  trial,  the  subject  was  read  the 
same  "reminder"  instructions  concerning  the  recognition  procedure  as 
were  used  in  the  practice  procedure.  Approximately  15  seconds  elapsed 
between  the  end  of  the  last  presentation  trial  and  the' beginning  of  the 
recognition  test,  while  the  experimenter  read  the  instructions  and  adjusted 
the  apparatus.  The  subject  was  then  shown  the  24  pairs  of  the  recognition 
test,  each  pair  being  shown  until  the  subject  responded.  The  subject's 
final  response  to  each  pair  was  recorded.  After  the  recognition  test  was 
completed,  the  subject  was  told  to  rate  for  similarity  the  last  three  sets 
of  stimuli,  using  the  same  procedure  as  before,  and  was  then  dismissed. 

Treatment  of  Similarity  Rating  Data.  The  data  were  first  converted 
from  their  triadic  form  to  paired  rankings,  and  from  these  the  best  complete 
order  of  ranking  of  the  15  pairs  of  stimuli  from  cacu  set  was  determined. 

The  criteria  by  which  a  given  order  was  decided  to  be  best  are  discussed  in 
Chapter  III.  The  procedure  for  determining  the  best  order  was  as  follows: 

(1)  The  method  of  triangular  analysis  (Coombs,  1964)  was  used  to  determine 


the  best  partial  order  (including  ties  between  pairs  which  had  noL  been 
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compared  directly)  of  the  pairs  which  could  be  derived  from  the  data.  As 
this  involved  permuting  columns  and  rows  of  a  15  x  15  matrix,  a  computer 
program  was  used  which  first  provided  a  rough  solution  by  ranking  the  sums 
of  entries  in  the  rows  and  then  reduced  the  labor  of  doiiig  the-  remaining 
adjustments  manually. 

(2)  The  best  complete  order  of  the  pairs  was  then  determined  manually  from 
the  partial  order  in  (1)  by  applying  the  criteria  discussed  in  the  next 
chapter . 
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CHAPTER  III 

Results 

Analysis  of  Similarity  Rating  Data 

The  first  step  in  the  analysis  of  the  data  is  the  derivation  of 
a  set  of  ranked  interstimulus  distances  from  each  subject's  responses  on 
the  similarity  rating  tasks.  There  exist  a  number  of  possible  procedures 
for  accomplishing  this  purpose  which  correspond  to  given  data  reduction 
models  or  to  assumptions  concerning  the  processes  underlying  the  pattern 
of  response.  Before  the  procedure  used  in  this  study  is  described,  a  brief 
review  of  the  objectives  of  the  experiment  will  be  helpful  for  an  under¬ 
standing  of  the  rationale  behind  the  techniques  used. 

The  purpose  of  this  study  is  to  evaluate  a  procedure  for  measuring 
the  degree  of  similarity  between  a  number  of  verbal  stimuli.  It  was  de¬ 
cided  to  develop  a  technique  for  producing  an  ordinal  scale,  i.e.,  a  rank- 
order  of  interstimulus  distances,  for  each  subject. 

A  major  task  in  the  evaluation  of  this  technique  is  to  demonstrate 
that  it  can  produce  results  of  satisfactory  reliability  and  validity.  Mea¬ 
sures  of  reliability  and  validity  typically  involve  tne  calculation  of 
coefficients  of  correlation  between  the  test  being  studied  and  some  criterion 
measure;  in  the  present  study,  the  most  appropriate  measure  of  correlation 
is  the  rank  correlation  coefficient,  or  Spearman's  rho  (Siegel,  1956).  The 
calculation  of  Spearman's  rho  involves  ranking  a  set  of  individuals  on  two 
variables.  The  smaller  the  differences  between  the  two  ranks  each  subject 
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receives  for  uis  scores  on  tae  two  variables,  the  0reater  the  value  of  rho, 
anu  the  greater  the  relationship  indicated  between  the  two  variables, 

L'o  determine  the  rank  correlation  coefficient  it  is  not  necessary 
that  individuals  be  allotted  different  scores  on  each  of  the  two  variables. 
It  is  possible  to  calculate  a  correlation  coefficient  when  a  number  of 
individuals  have  the  same  score  on  a  variable  (i.e,  when  the  measure  of  the 
variable  does  not  discriminate  between  the  amounts  of  the  variable  that 
several  subjects  possess).  The  subjects  with  equal  scores  are  allotted 
the  same  rank,  which  is  the  median  of  the  ranks  that  the  group  w7ould  have 
had  if  they  had  been  differentiated.  The  coefficient  of  correlation  is 
then  calculated  as  before. 

However,  as  the  calculation  of  Spearman’s  rho  is  an  adaptation  of 
the  Pearson  correlation  coefficient  based  on  the  assumption  that  the  in¬ 
tervals  between  ranks  are  equal,  the  estimate  of  the  relationship  between 
two  variables  through  rho  will  be  distorted  if  too  many  tied  ranks  are 
allowed  to  occur.  It  is  obvious,  then,  that  any  demonstration  of  statis¬ 
tically  significant  reliability  and  validity  of  the  measurement  procedure 
under  study  must  avoid  the  possibility  that  this  significant  relationship 
is  trivial.  In  the  interests  of  rigorously  evaluating  the  procedure  being 
studied,  therefore,  an  attempt  was  made  to  minimize  the  number  of  tied 
ranks  in  the  ordinal  inter— stimulus  distance  scale  derived  from  the  simil¬ 


arity  rating  data. 

This  was  done  by  applying  successive  assumptions  to  tne  data, 
each  assumption  in  the  series  yielding  information  not  provided  by  tne  pie- 
ceding  assumption .  However,  the  certainty  witn  wnich  tne  information  yielded 
by  a  given  assumption  w7as  regarded  decreased  with  the  assumption’s  rank  in 
To  put  this  in  another  way,  a  number  of  different  models  were 


the  series. 


/ 
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used  to  interpret  the  data.  These  models  are  rank  ordered  so  that  if  they 
yield  conflicting  information,  the  results  of  a  higher-ranked  model  would 
be  accepted  over  those  of  a  lower- ranked  model.  In  general,  each  model 
yields  some  unique  information  and  some  information  shared  with  that  yielded 
by  other  models  (where  these  several  sources  of  information  might  be  in 
agreement,  or  conflicting).  The  plan  is  to  derive  a  rank-ordering  of  the 
interstimulus  distances  by  first  using  all  of  the  information  provided  by 
the  model  in  which  the  most  confidence  can  be  placed,  then  using  the  inform¬ 
ation  from  the  next-ranked  model  which  was  not  supplied  by  the  first  model, 
and  so  on. 


The  first  assumption  is  that  the  rank-ordering  of  the  inter-stim¬ 
ulus  distances  for  a  given  set  of  similarity  rating  data  is  transitive.  From 
this  it  follows  that  the  dominance  relations  between  stimulus  pairs  must  be 
transitive,  i.e.,  if  A3  >  BC,  and  BC  AC,  then  AB  ">  AC,  where  XY  signifies 
"the  distance  between  stimuli  X  and  Y".  In  the  case  where  the  data  indicate 
that  AB  >-  BC  and  BC  >  CD,  but  the  distances  AB  and  CD  have  not  been  compared 
in  the  experimental  procedure,  by  the  first  assumption  we  can  conclude  that 
AB  >  CD. 


However,  it  is  sometimes  the  case  that  three  pairs  of  distances 
have  been  rated  in  the  data  but  have  indicated  an  intransitive  relationship. 
For  example,  a  subject  may  have  responded  that  A3 >  BC,  BC  >  AC,  and  AC  >  AB. 
In  this  case,  it  is  assumed  that  this  intransitive  series  is  due  to  an  error 
in  rating  by  the  subject.  Where  one  or  more  intrans itivities  occui  in  a 
subject’s  similarity  ratings  the  most  probable  true  rank-order  of  the  inter- 
stimulus  distances  is  assumed  to  be  tue  one  v.hicii  contradicts  Lac  data  least, 
i.e.,  that  would  require  the  fewest  alterations  to  the  data  to  produce.  If 


more  than  one  ranking 


of  the  distances  can  be  achieved  with  the  same  minimal 
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number  of  data  changes,  each  distance  is  allocated  the  mean  of  the  ranks 
that  it  receives  in  the  different  orderings. 

Frequently  sets  of  partial  rankings  such  as 

AB  >  DC  >  Cl)  >  DE 
and  AC>7ji>DE 

are  found,  where  AE  had  not  been  compared  with  BC  or  CD.  The  position  of 
AE  is  indeterminate  relative  to  BC  or  CD  by  the  first  assumption.  The 
second  assumption,  which  covers  cases  such  as  this,  is  that: 


if 

AB  >  AE  >DE, 

then 

E(AB  -  AE)  =  E(AE 

"  DE)  , 

and  if 

AE>£C>  CD>  DE, 

then 

E (AB  -  BC)  =  E  (BC 

-  CD)  *=  E(CD  -  DE)  , 

the  e> 

:pected  value  of  X^ 

in  the  population  X. 

the  assumption  that,  in  general,  E(UV  -1-  V.'X)  =  E(UV)*H  E(WX)  from  the  first 
equation  it  can  be  seen  that 

E (AE)  =  E (AB  +  DE)/2 


and  from  the  second  equation 


E (BC)  =  E  (Ad?  I-  CD)  / 2 

and 

E(CD)  -  E(BC  +  DE)/2 

since 

CD  >  DE , 

therefore 

E(BC) >  E(AE) , 

and  since 

AB  >  BC, 

therefore 

E(AE)>  E(CD)  . 

From  the  above  it  can  be  seen  that  the  most  probable  ranking  of  these  five 
distances,  according  to  the  second  assumption,  is  AL  *-*  DC  Ai .  ^  CD 3  Dh, 

It  is  thus  possible  to  produce  a  mutual  ordering  of  tnc  ink*-  ii  coiate  dis— 
tan ccs  in  two  unequal  series  of  distances  when  the  first  and  last  memt c i  •> 


of  the  series  arc  the  same  for  the  two  scries. 

The  third  assumption  can  sometimes  produce 
the  first  or  second  assumptions,  where  the  data  give 


a  ranking  not  given  by 
the  f o.l  lowing  cases: 


(1)  AB  >  BC  >  CD,  and  AB  >  AD  >  CD,  or 

(2)  AB  >  BC,  BC  >  AC,  and  AC  >  AB 

In  the  first  case,  the  non-compared  distances  BC  and  AD  are 
intermediate  between  A.B  and  CD,  but  the  ranking  cannot  be  resolved  by 
invoking  Assumption  2  as  the  series  arc  equal  in  length.  The  second  case 
produces  three  equally  probable  rankings,  according  to  Assumption  1: 


AB  >  BC  >  AC,  AC  >  AB  >  BC,  and  BC  >  AO  AB. 

Assumption  3  states  that  the  rank  order  of  a  given  inter-stimulus 
distance  is  a  function  of  the  number  of  other  distances  that  it  surpasses. 
Thus,  to  resolve  the  indeterminate  relative  rankings  of  AD  and  BC  in  Case  1 
above,  and  of  AB,  BC,  and  AC  in  Case  2,  highest  ranking  is  given  to  the 
distance  that  dominates  the  largest  proportion  of  all  the  distances  with 
which  it  was  compared  in  the  similarity  ratings,  second  highest  rank  is 

given  to  the  distance  that  had  the  next  largest  sum  of  "greater”  judgments, 

• 

and  so  on. 


In  summary,  the  procedure  for  determining  the  rank  order  of  the 
in ter stimulus  distances  from  the  similarity  ratings  runs  as  follows: 

0)  The  distances  were  arranged  in  the  complete  rank  order  that 
required  the  fewest  number  of  changes  in  the  subject's  pair-wTisc  similarity 
ratings.  Occasionally  this  was  all  that  was  required  for  a  given  set  of 
data ;  each  distance  had  been  compared  in  tnc  similarity  ra Linus  with  the 


distances 

distances 


immediately  above  and  below  it.  More  often,  two  or  more  series  of 
derived  from  a  set  of  ratings  would  begin  and  end  with  common 


members ,  but  the  intermediate  members  of  tne  chains  had  not  been,  compare d . 
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(2.)  When  two  series,  containing  x  and  y  stimulus  pairs,  respec¬ 
tively,  began  and  ended  with  common  members,  where  y,  each  of  the  x  -  1 

differences  between  succeeding  interstimulus  distances  in  the  first  scries 
was  set  as  equal  to  — ---  >  and  the  y  -  1  differences  between  adjacent  pairs 
in  the  second  series  were  each  set  equal  to  — - — —  .  The  two  series  were  then 
merged  into  a  common  order. 

Occasionally  it  was  found  that  three  or  more  series  occurred  in  a 
given  set  of  data  having  different  initial  and  final  members.  This  created 
a  problem,  as  different  common  orders  could  be  found  depending  on  which 
two  series  were,  picked  to  be  merged  first.  It  was  decided  that  in  cases 
such  as  this  the  longest  partial  order  including  the  greatest  and  least 
distances  in  the  set  would  be  used  as  a  standard,  and  that  all  shorter  par¬ 
tial  orders  would  be  subordinated  to  this. 

(3)  In  cases  where  there  were  two  non-compared  pairs  tied  for 
the  same  ranks,  including  cases  of  parallel  series  each  containing  an  ident¬ 
ical  number  of  members,  the  ties  were  broken  by  allotting  the  higher  rank 
to  the  pair  which  dominated  the  higher  proportion  of  all  the  pairs  to  which 
it  had  been  compared.  This  strategern  was  also  used  to  resolve  ties  caused 
by  circular  chains  due  to  intransitivities;  the  highest  rank  went  to  the 
pair  in  the  chain  having  the  highest  total  "vote  count",  and  the  remaining 
pairs  were  ordered  according  to  the  pair-wise  ranking  of  the  similarity 
estimation  data. 

It  can  be  noted  that  Assumption  1  is  what  Coombs  (1964)  calls  a 
"decomposition  model".  He  recommends  it  for  the  study  ol  similarities  data 


gathered  through  the  present  method  because  it  uses  only  that  information 
concerning  the  relative  similarity  of  a  pair  of  distances  that  was  derived 
from  a  subject's  ratings  of  the  two  distances  togelaer.  Assumption  <•-  has 


not,  to  the  writer's  knowledge,  been  used  in  this  type  of  scaling  before; 


its  chief  virtue  is  its  parsimony  in  postulating  that  the  differences  be¬ 
tween  a  given  interstimulus  distance  and  the  distances  immediately  above 
and  below  it  are  equal.  Whether  this  assumption  is  valid  is  a  question  of 
fact,  and  would  seem  best  studied  empirically.  The  third  assumption  corres¬ 
ponds  to  what  Coombs  (1964)  refers  to  as  an  "expected  matrix  model";  he 
does  not  recommend  its  use  for  the  present  method  of  gathering  similarities 
data  because  it  utilizes  information  about  a  pair  of  distances  derived  from 
comparisons  of  the  single  distances  with  different  groups  of  other  distances 
This,  he  feels,  distorts  the  rankings  obtained  because  of  greater  context 
effects  and  the  greater  instability  of  differing  comparisons.  However,  this 
assumption  was  felt  to  be  useful  because  of  the  information  it  provided  on 
nori-compared  pairs. 

No  other  justification  of  these  assumptions  will  be  presented  at 
this  time,  apart  from  mentioning  that  they  appear  to  have  achieved  their 
intended  purpose  of  minimizing  ties  in  the  distance  rankings.  Out  of  320 
rankings  of  15  interstimulus  distances  there  occurred  only  285  ties  involvin 
two  distances,  55  ties  involving  three  distances,  and  14  ties  involving  four 
distances.  As  a  two-way  tie  means  that  the  ranking  of  one  pair  of  distances 
is  indeterminate,  a  three-way  tie  that  the  rankings  of  three  pairs  are  in¬ 
determinate,  and  a  four— way  tie  renders  indeterminate  six. paired  rankings, 
it  can  be  seen  that  the  ranking  of  only  534  pairs  of  distances  was  unavail¬ 
able  from  the  above  procedure.  This  represents  .less  than  1  /<>  of  the  total 
of  67,200  rankings  of  pairs  of  distances  implicit  in  the  data  of  this  ex- 
periment .  Further  scrutiny  of  the  assumptions  is  deferred  not  because  it 
is  felt  that  the  procedure  is  free  from  majoi  criticism,  but  > oc nu ■-> c.  it  i.> 
felt  that  this  discussion  is  best  postponed  until  these  assumptions  tan  be 
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studied  in  the  light  of  relevant  empirical  evidence.  The  purpose  of  the 
present  study  is  to  validate  the  procedure  by  demonstrating  that  similarity 
ratings  are  related  to  generalization  errors  in  paired-associate  learning. 
Reliability  of  Similarity  Rankings 

Internal  Consistency .  After  the  five  sets  of  distance  rankings 
had  been  established  for  each  subject,  the  data  were  examined  with  regard 
to  corrections  for  intransitivities  in  the  rankings.  As  the  procedure 
assumes  that  the  pair-vise  comparisons  can  lead  to  a  transitive  ordering  of 
the  interstimulus  distances,  whether  or  not  the  in transitivities  found  can 
be  ascribed  to  chance  error  in  the  similarities  data  is  of  interest  in 
verifying  this  assumption.  If  the  average  number  of  intransitivities  were 
larger  than  could  be  reasonably  attributed  to  measurement  error,  then  the 
validity  of  the  basis  of  entire  procedure  would  be  in  serious  doubt. 

It  was  found  that  the  mean  number  of  intransitivities  per  test 
over  all  tests  on  all  subjects  was  2.92.  Little  is  known  about  the  char¬ 
acteristics  of  the  expected  distribution  of  intransitivities  in  this  data— 
collection  method,  so  no  test  of  significance  can  be  applied  to  these  data. 
However,  as  this  figure  indicates  that  of  the  60  paired  comparisons  made  by 
each  subject,  an.  average  of  less  tnan  j/o  were  inconsistent  with  eacn  otuci , 
it  wTculd  seem  that  this  finding  supports  tne  assumption  tnat  tne  underlying 
similarity  relationship  for  the  stimuli  studied  is  essentially'  transitive. 

The  occurrence  of  in transitivities  is  of  interest  oecauoc  it 
gives *arv  index  of  the  reliability  of  the  similarity  rankings  (\ asteunouv , 
1962),  and  can  be  considered  as  a  measure  of  the  internal  consistency  of 
the  measuring  instrument.  The  intransitivities  data  were  examined  to  deter¬ 
mine  whether  frequency  of  intransitivities  is  relatively  stable  over  all 
conditions,  or  whether  subjects  behave  more  inconsistently  under  some 
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circumstances  than  under  others.  The  effects  on  frequency  of  intransit¬ 
ivities  of  the  variables  of  previous  experience  in  ranking  for  similarity, 
meaningfulness  of  material,  specific  list  material,  and  experience  with  the 
material  as  paired-associate  stimuli  were  studied  to  evaluate  the  data’s 
potential  for  further  analysis. 

It  will  be  recalled  that  each  subject  rated  for  similarity  five 
sets  of  six  stimuli.  Six  of  the  stimuli  in  the  pairs  to  be  learned  and  six 
control  stimuli  were  rated  before  the  learning  task,  and  then  the  ratings 
were  repeated  after  learning.  The  material  in  these  ratings  was  either  all 
high-meaningful  or  all  low-meaningful  for  a  given  subject.  In  addition, 
each  subject  rated  the  second  set  of  experimental  stimuli  (of  opposite 
meaningfulness  to  the  set  previously  rated)  after  the  learning  task,  this 
material,  of  course,  being  rated  for  the  first  time. 

An  inspection  of  the  data  suggested  that  this  last-mentioned 
material  had  a  markedly  greater  number  of  intransitivities  associated  with 
it  than  did  the  other  sets  (see  Table  IV).  An  analysis  of  variance  over  the 
five  ratings  showed  a  significant  F  -  ratio  for  differences  between  treat¬ 
ments,  F ( 4 ,  252)  =  10.05,  pc. 005.  When  the  mean  number  of  intransitivities 
for  the  experimental  material  rated  for  the  first  time  after  learning  is 
compared  to  the  other  four  means  combined,  an  F-ratio..  of  39.46  is  found. 


TABLE  IV 


mean  number  of  intransitivities  in  five  similarity 

RANKINGS  OF  STIMULI 


Before  Experimental  Control  Exper.non- 

Experimental  Control  Repeated  Kcpeatec  Repeated 


Total 


Mean  2.63 


2.45  2.66  2.72  4.23  2.92 
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Because  this  is  an  a  posteriori  test,  Scheffe’s  (Winer,  1964)  procedure  is 
used  to  determine  the  criterion  for  significance.  The  value  of  F_  necessary 
to  reach  significance  at  the  .005  level  of  probability  for  4  and  252  degrees 
of  freedom  by  this  procedure  is  about  24.0;  thus  the  observed  difference  is 
clearly  significant.  Ko  other  differences  between  treatment  means  were 
significant.  It  may  be  noted  that  the  F-ratio  (F  (63,  256)  =  2.338)  for 
betveen-subjects  effects  v;as  significant  at  the  .001  level,  indicating  that 
there  are  reliable  individual  differences  in  frequency  of  intransitivities. 

A  possible  explanation  for  the  difference  between  treatment  means 
is  that  the  subjects  became  set  in  the  dimensions  of  similarity  that  they 
attended  to  after  the  first  two  or  four  similarity  ratings,  which  were  all 
of  the  same  level  of  meaningfulness.  Consequently,  when  asked  to  rate 
material  of  a  different  level  of  meaningfulness,  they  found  it  difficult 
to  adjust  to  different  criteria  of  similarity,  and  as  a  result  showed  an 
increase  in  the  inconsistency  of  their  ratings. 

It  was  next  decided  to  determine  if  there  were  differences  in 
inconsistency  of  response  as  a  function  of  the  meaning fulness  of  the  material 
and  tlie  two  lists  of  stimuli  used.  This  analysis  was  done  on  only  the 
material  rated  before  the  learning  task,  as  it  was  wished  at  this  point  to 
determine  the  effects  of  meaningfulness  and  list  without  the  added  complica¬ 
tion  of  whether  or  not  the  material  had  occurred  as  stimuli  in  a  leaining 


situation,  or 
time.  A  2  x 
variable  was 


whether  the  material  was  being  scaled  for  the  first  or  second 
2  factorial  analysis  of  variance,  with  repeated  measures  on  one 
carried  out.  The  means  appear  in  Table  V  and  a  summary  of  the 


analysis  in  Table  VI. 


* 
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TABLE  V 


KEAN  NUMBER  OF  I.NTRANSITIVITIES  IN  PRE-LEARNING  SIMILARITY 
RANKINGS  AS  A  FUNCTION  OF  MEANINGFULNESS  AND  LIST. 


List 

1 

2 

Mean 

High 

2.91 

2.31 

2.61 

Meaning! ulness 

Ldw 

2.09 

2.84 

2.48 

Mean 

2.50 

2.58 

2.54 

TABLE  VI 


SUMMARY  OF  ANALYSIS  OF  VARIANCE  OF  NUMBER  OF 
INTRANSIT1VITI.ES  IN  PRE-LEARNING  SIMILARITY  RANKINGS 


Source 

Between  subjects 

Meaning fulness  (M) 
Subjects  within  groups 


W  i  t:  bin  S  u  b  j  e  c  t  s_ 

Lists  (L) 

M  x  L 

L  x  Subjects  within  groups 


df  SS 


1 

62 


0.633 

221.672 


0.633  <1 

3.373 


1 

1 

62 


0.196 

14.445 

132.859 


0.196 

14.445 

2.143 


<  1 
6.74* 


*  Significant  at  the  .05  level  of  probability. 


It  can  be  seen  that  the  only  significant  effect  is  the  interaction 
between  lists  and  levels  of  meaning! ulness •  Jable  V  shows  tnat  the  high- 

meaningful  material  in  List  1  produced  more  inconsistent  response  than  did 
the  low-meaningful  material,  whereas  the  converse  was  true  of  List  2.  As 
it  will  be  recalled  that  an  attempt  was  made  to  design  tnc  two  lists  to  be 
as  equivalent  as  possible,  it  would  seem  that  there  are  potent  mateiial.— 
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specific  effects  on  consistency  of  rating  that  cannot  be  predicted  from 
general  properties  such  as  meaningfulness. 

The  final  statistical  evaluation  of  the  intransitivities  data 
concerned  the  combined  effects  of  meaningfulness,  list,  and  degree  of 
exposure  to  the  material  as  stimuli  in  a  paired-associate  learning  task  on 
the  stability  of  consistent  responding.  Changes  in  frequency  of  intran¬ 
sitivities  between  before  and  after  the  use  of  the  rated  material  as  stim¬ 
uli  in  2,  4,  8,  and  16  trials  of  paired-associate  learning  were  studied  in 
an  attempt  to  determine  if  distance-ranking  data  measures  taken  at  various 
points  in  a  learning  experiment  could  be  considered  equally  reliable. 

This  was  done  by  an  analysis  of  variance  on  the  two  differences 
(in  the  experimental  and  control  lists)  between  number  of  intransitivities 
before  and  after  the  learning  task  for  each  subject.  The  data  were  trans¬ 
formed  to  eliminate  negative  values  by  adding  a  constant  factor  of  8  to 
each  score. 


It  can  be  seen  from  Table  VII  that  only  the  effect  due  to  the 
interaction  between  levels  of  meaning fulness  and  trials  is  significant . 

This  relationship  is  depicted  graphically  in  Fig.  1,  which  shows  that  for 
the  high-meaningful  material  the  frequency  of  intransitivities  decreased 
slightly  after  2,  4,  and  16  trials  of  learning,  but  showed  a  sharp  increase 
after  8  trials,  while  the  curve  for  low— meaningful  material  is  U— snapod, 
being  highest  for  2  and  16  trials,  and  lowest  for  4  and  8  Li ials, 
interpretation  can  be  suggested  for  this  configuration.  Considei in0  that 
the  statistical  significance  ol  this  difference  was  mai.girml,  taut  it  was 
one  of  15  independent  tests  in  the  analysis  of  variance,  and  that  no  meaning' 
ful  interpretation  seems  apparent,  furthur  discussion  of  this  effect  could 


seem  fruitless. 
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TABLE 


VII 


SUMMARY  OF  ANALYSIS  OF  VARIANCE  OF  CHANGE  IN  NUMBER  OF 
INTRANSITIVITIES  IN  SIMILARITY  RANKINGS  BETWEEN  BEFORE 
AND  AFTER  PAIRED-ASSOCIATE  LEARNING 


Source 


B  e  tw  e  e  n  sub  j  e.  c  t:  s  d  f 


Meaningfulness  (K)  1 
Trials  (T)  3 
M  XT  3 
E  X  L  1 
E  X  L  M.  1 
E  X  L  X  T  3 
E  X  L  X  M  X  T  3 
Subjects  within  groups  48 


Within  subjc cts 


Experimental  condition  (E)  1 
E  X  M  1 
EXT  3 
E  X  M  X  T  3 
List  (L)  1 
L  X  N  1 
L  X  T  3 
L  X  li  X  T  3 
Residual  (within)  48 

Total  J  27 


ss 

MS 

F 

0.945 

0.945 

<  1 

12.710 

4.237 

1  .228 

37.960 

12.653 

3.668* 

0.070 

0.070 

<  1 

11 .882 

11.882 

3.444 

9.460 

3.153 

<  1 

14.898 

4.966 

1.439 

165.625 

3.450 

— 

0.382 

0.382 

1 

5.695 

5.695 

M  1 

3.523 

1.174 

<  1 

13.460 

4.487 

<1. 

2.820 

2.820 

<  1 

1.320 

1 . 320 

<1 

6.710 

2.237 

<1 

2.460 

0.820 

<1 

320.125 

6,669 

— 

*  Significant  at  the  .05  level  of  probability. 


To  summarize  the  findings  with  respect  to  internal  consistency  of 
the  similarity  measure  as  indicated  by  frequency  of  intransitivities,  it: 
was  found  that  subjects’  rating  showed  a  high  level  of  consistency  overall. 
The  data  seemed  to  show*  that  subjects  became  more  inconsistent  in  their 
ratings  when  they  were  given  a  set  of  stimuli  whose  similarity  was  higher 
or  lower  than  the  previous  sets  that  they  had  rated.  Significant  differ¬ 
ences  in  rating  consistency  were  found  as  a  function  of  specific  sets  of 
stimuli  that  did  not  seen;  predictable  from  the  degree  of  meaningfulness  ol 


64 


the  stimuli  nor  could  they  be  controlled  by  constructing  lists  according 
to  apparently  similar  principles. 

Test-Rctest  Reliability .  The  next  step  in  the  evaluation  of  the 
similarity  rankings  was  an  examination  of  their  test-retest  reliability. 

This  was  done  by  first  calculating  test-retest  correlation  coefficients 
between  the  pre-  and  post-learning  control  stimulus  rankings,  and  then 
studying  the  stability  of  reliability  coefficients  under  various  variables. 

Table  VIII  shows  the  mean  test-retest  rank  correlation  coefficients 
over  16  subjects  for  the  derived  rankings  of  four  control  sets  of  stimuli 
defined  by  the  combination  of  two  lists  and  two  levels  of  meaningfulness. 

The  control  sets  of  stimuli,  it  will  be  remembered,  are  the  ones  which 
were  not  used  in  the  learning  task;  thus  the  operation  used  to  determine 
these  correlation  coefficients  was  to  nave  each  subject  rate  for  similarity 
a  set  of  six  stimuli,  next  attempt  a  learning  task  involving  different 
stimuli,  and  then  re-rate  the.  stimuli. 

TABLE  VIII 

MEAN  RANK  ORDER  TEST-RETEST 
RELIABILITY  COEFFICIENTS  FOR  CONTROL  STIMULI 

LIST 

_  1  2 _ _ _  Mean . 

High  0.597  0.699  0.648 

Meaningfulness  Low  0 »  7 8 3 _  [)  ,1  \  2. _ 0 . 748 

Mean  0 . 690 _  0.706 _ 0 . 698 _ 

It  should  be  emphasized  that  these  coefficients  are  a  measure  of  the 
correlation  between  the  two  sets  of  rankings  d e r i v c d  f rou.  tne  befoi e  ano 

after  ratings. 
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Figure  1.  Mean  change  in  number  of  intransitivities  before  and 
after  Learning. 


65 


All  the.  mean  correlation  coefficients  in  Table  VIII,  if  reported 
for  a  single  subject,  would  be  significant  at  the  .05  level  of  probability. 
Of  the  16  subjects  rating  the  high-meaningful  stimuli,  11  had  reliability 
coefficients  significant  at  a  maximum  of  p <  .05;  for  the  List  2  high- 
meaningful  material  subjects,  13  out  of  16  coefficients  were  at  less  than 
the  .05  level  of  probability;  for  the  ratings  of  the  List  1  low-meaningful 
stimuli,  15  out  of  16  coefficients  were  significant  at  less  than  the  ,05 
level;  while  for  the  List  2  low-meaningful  stimuli  14  out  of  16  reliability 
coefficients  were  significant  at  or  beyond  the  .05  level  of  probability. 

Of  the  64  individual  reliability  coefficients  calculated,  6  were  signifi¬ 
cant  with  ,01<p<  .05,  16  were  significant  with  .001  <p<  .01,  and  30  with 
p<.001,  while  only  12  coefficients  were  not  significant  at  equal  to  or 
less  than  the  .05  level  of  probability. 

Considering  the  abbreviated  nature  of  the  measuring  instrument 
and  the  fact  that  the  reliability  coefficients  express  the  consistency  of 
not  only  the  subjects’  rating  behavior  but  also  of  the  procedure  that  was 
used  to  derive  the  rankings  of  the  distances,  the  observed  mean  reliability 
coefficient  of  0.698  would  seem  to  indicate  a  satisfactory  level  of  stabil¬ 
ity.  It  is  interesting  to  note  that  the  average  reliability  seems  to  be 
higher  for  the  low-meaningful  stimuli  than  for  the  high-meaningful  material. 
Also,  within  levels  oi  meaningfulness,  the  higher  correlation  coefficients 
are  associated  with  the  lower  mean  frequencies  of  inconsistencies  (Table  V) , 
and  vice  versa. 

Having  established  that  the  procedure  under  study  is  moderately 
reliable,  the  effects  of  the  variables  of  list,  meaningfulness,  use  of  the 
stimuli  as  experimental,  or  control  material  in  a  learning  tasK,  and  number 
of  trials  of  attempted  learning  upon  the  consistency  of  subjects  ranking 
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of  the  interstiinulus  distances  were  studied. 

This  was  done  through  an  analysis  of  variance  on  the  reliability 
coefficients  of  the  rankings  of  the  control  and  experimental  stimuli.  The 
data  were  transformed  by  first  converting  the  values  of  the  correlation 
coefficients  to  Fisher’s  Zr  to  normalize  the  sampling  distribution,  and  then 
adding  1  to  all  scores  to  eliminate  minus  values.  A  summary  of  the  analysis 
of  variance  is  presented  in  Table  IX.  It  can  be  seen  from  this  summary 
that  only  two  effects  were  statistically  significant:  levels  of  meaning¬ 
fulness,  and  the  four-way  interaction  between  experimental  condition,  list, 
meaningful  ness  and  trials. 

A  graph  of  the  four-way  interaction  is  shown  in  Figure  2.  It 
would  seem  difficult  to  provide  a  meaningful  interpretation  of  this  apparent 
ly  random  relationship.  When  it  is  considered  that  the  estimate  of  the 
effects  of  the  four-way  interaction  is  based  on  a  between-groups  comparison, 
each  point  on  the  graph  being  determined  by  the  data  from  only  four  subjects 
that  the  statistical  significance  of  this  interaction  was  marginal,  and 
that  none  of  the  four  three-way  interactions  has  an  F-ratio  greater  than 
unity,  no  reasonable  explanation  for  the  magnitude  of  the  four-way  inter¬ 
action  seems  apparent.  It  might  be  pointed  out  that  the  four-way  inter¬ 
action  could  be  regarded  as  the  interaction  between  the  effect  of  the 
three-way  interaction  of  experimental  condition,  meaningfulness,  and  trials 
(a  theoretically  meaningful  interaction  with  a  negligible  mean  square)  and 
the  effect  of  the  factor  of  lists  (a  factor  of  little  theoretical  interest, 
as  it  represents  essentially  random  item-specific  differences). 
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TABLE  IX 


ANALYSIS  OF  VARIANCE  OF  TEST- RETEST  RELIABILITY 
COEFFICIENTS  FOR  RANKED  INTER-STIliULUS  DISTANCES 


Source 


Between  subjects 

df 

SS 

MS 

F 

Meaningful ness  (M) 

1 

2.732 

2.732 

8.229** 

Trials  (T) 

3 

1.600 

0.533 

1.777 

M  X  T 

3 

0.347 

0.116 

<1 

E  X  L 

1 

0 . 004 

0.004 

<1 

E  X  L  X  >1 

1 

0.278 

0.278 

^1 

E  X  L  X  T 

3 

0.727 

0.242 

<] 

E  X  L  X  M  X  T 

3 

2.890 

0.963 

2.901* 

Subjects  within  groups 

48 

15.956 

0.332 

- 

W  i  tl  i  i  n  s  u  b  j  e  c  t  s 

Experimental  condition  (E) 

] 

0.040 

0.040 

<1 

E  X  M 

1 

0.054 

0.054 

^1 

E  X  T 

3 

0.516 

0.172 

1.186 

E  X  K  X  T 

3 

0.040 

0.013 

^1 

List  (L) 

1 

0.030 

0.030 

<1 

L  X  K 

1 

0.571 

0.571 

3.938 

L  X  T 

3 

0.915 

0.305 

2.103 

L  X  M  X  T 

3 

0.417 

0.139 

<1 

Residual  (within) 

48 

6.980 

0.145 

Total 

127 

, . 

*  Significant  at  .05  level  of  probability 

**  Significant  at  .01  level  of  probability 


Concerning  the  significant  meaningful ness  effect,  the  mean  re¬ 
liability  coefficient  for  the  high-meaningful  material  is  0.665  and  for  the 
low-meaningful  material  is  0.768.  The  difference  between  these  figures  is 
approximately  equal  to  that  reported  earlier  for  the  control  material  alone. 
Thus,  it  can  be  concluded  that  the  reliability  of  similarity  rankings  is 
greater  for  the  low-meaningful  material  than  for  the  high-meaningful  stimuli. 

The  hypothesis  that  extensive  experience  with  stimuli  in  a  paired- 
associate  learning  task  might  change  the  similarity  relations  between  the 
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stimuli  was  considered  next.  If  this  hypothesis  were  correct,  it  would  be 
expected  that  the  test-retest  reliability  of  the  experimental  material 
would  decrease  as  a  function  of  trials,  while  the  reliability  of  the  con¬ 
trol  stimuli  would  remain  relatively  unchanged.  If  this  occurred,  it  should 
be  evidenced  in  the  analysis  of  variance  as  a  significant  experimental 
conditions  by  trials  interaction,  ho  such  significant  interaction  is 
apparent  in  Table  IX. 


Fig.  3  shows  a  graphical  representation  of  the  relevant  data. 

It  can  be  seen  that  the  difference  in  reliability  between  experimental 
conditions  is  relatively  constant  for  the  groups  tested  after  two,  four, 
and  eight  trials,  but  shows  the  predicted  drop  for  the  experimental  con¬ 
dition  on  Trial  16.  It  is  possible  that  a  large  number  of  learnin0  trials 
must  occur  before  the  hypothesized  drop  in  reliability  is  found,  and  this 
drop  had  been  obscured  by  having  three  tests  during  the  first  eight  trials 
of  learning  but  only  one  test  on  the  sixteenth  trial.  If  this  were  the 
case,  then  the  correct  test  of  the  hypothesis  would  be  to  con. pare  the  dif¬ 
ference  between  control  and  experimental  material  of  Trial  16  with  the 
combined  experimental-control  differences  on  Trials  2,  4,  and  8.  However, 
the  F-ratio  for  this  comparison  was  only  3.21  with  1  and  AS  degrees  of 
freedom.  As  an  F-ratio  of  4. 04  is  necessary  for  significance  at  the  .05 
level  with  these  degrees  of  freedom,  the  hypothesis  that  there  is  no  differ¬ 
ence  in  reliability  between  experimental  and  control  material  for  differing 
degrees  of  practice  on  a  paired-associate  learning  task  cannot  be  rejected 
on  the  evideiice  presented  here, 

Degree  of  Concordance  Among  Subjects.  It  would  be  of  interest  to 
know  if  all  subjects  gave  basically  the-  same  rankings  wnen  tating  each  set 
of  stimuli,  or  if  there  are  individual  differences  in  the  lankingo  of 
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different  subjects.  It  had  been  hypothesized  earlier  in  this  paper  that 
there  are  appreciable  differences  in  inter-subject  consistency  in  ranking 
the  inters!  in.iu.lus  distances  for  highly  meaningful  stimuli.  To  measure  the 
amount  of  agreement  between  subjects  on  the  rankings  of  the  four  sets  of 
stimuli,  kendall  s  coefficient  of  concordance  (Siege],  1956)  was  calcu¬ 
lated  for  these  data  (Table  X) ,  All  reported  values  of  W  ore  significant 
at  the  .001  level  of  probability.  The  coefficient  of  concordance  is  a 
measure  of  the  amount  of  agreement  between  a  number  of  subjects  on  the 
ranking  of  a  set  of  stimuli  (here,  distances) ,  "Concordance"  is  a  function 
of  the  average  rank  correlation  coefficient  that  would  be  found  if  this 
were  calculated  for  all  possible  pairs  of  rankings;  Table  X  also  shows  this 
average  value  of  rho,  the  rank  correlation  coefficient  for  the  four  sets 
of  stimuli. 


TABLE  X 


COEFFICIENTS 


OF  CONCORDANCE  (W)  AND  AVERAGE  RANK 
FOR  FOUR  SETS  OF  STIMULI 


CORRELATION 


(EHOav) 


Meaningfulness _ _ _ _ _ high  _ _ _ Low 


. ^ 

List 

1 

2 

1 

2 

Coefficient  of 
Concordance  (V.T) 

.303 

.346 

.718 

.681 

Average  Rank 
Correlation 
Coefficient  (rho^) 

.281 

.325 

.709 

.671 

It  will  be.  noticed  that  the  average 
quite  small  for  the  high-meaningful  material, 
uli  they  are  only  slightly  less  than  the  mean 
(Table  VIII)  for  the  low-meaningful  material. 


correlation  coefficients  are 
but  for  the  low-meaning  stim- 
re liability  coefficients 
Also,  the  rankings  of  W  and 
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rn°Av  are  same  within  each  level  of  meaning! ulness  as  the  rankings  of 
the  reliability  coefficients. 

from  these  results  it  would  seem  that  there  was  some  communality 
among  subjects  rankings  for  all  four  sets  of  stimuli.  However,  there  v;ould 
seem  to  be  more  stereotypy  of  response  for  the  low-meaningful  stimuli 
than  for  the  high-meaningful  stimuli.  In  fact,  there  is  reason  to  believe 
that  virtually  all  of  the  measured  deviation  from  perfect  agreement  among 
subjects  about  the  ranking  of  the  low7— meaningful  material  is  due  to  errors 
of  measurement,  when  it  is  considered  that  the  average  correlation  coef¬ 
ficient  for  the  List  1  distance  rankings  is  only  .065  less  than  the  re¬ 
liability  coefficient,  while  the  difference  for  the  List  2  material  is  only 
,027.  The  corresponding  differences  for  the  high-meaningful  material,  in 
contrast,  are  .316  and  .374  for  Lists  1  and  2,  respectively.  If  a  correction 
for  attenuation  is  made  to  estimate  what  the  average  correlation  between 
subjects  v.Toul d  be  with  perfectly  reliable  tests,  the  corrected  rhos  for  the 
low-meaningful  material  are  .905  and  .942,  while  the  average  correlations 
for  the  high-meaningful  material  are  only  .470  and  .464,  This  v/ould  seem 
to  indicate  that  there  was  a  considerable  amount  of  disagreement  between 
subjects  as  to  the  ranking  of  the  distances  between  the  high-meaningful 
stimuli,  although  individual  subjects  showed  reasonable  consistency  in 
repeating  their  rankings. 

The  mean  ranking  given  to  each  inters t inulus  distance  was  cal¬ 
culated  for  the  low-meaningful  material  of  List  1  and  List  2;  the  stimulus 
pairs  and  the  means  are  shown  in  Table  XI.  i he  pairs  in  Lists  1  and  2  have 
been  matched  in  Table  XI  according  to  theii  common  stiuctuies  oi  tne  letter 
elements  of  which  they  are  composed,  a  Pearson  product— mo.  .ent  cot i elation 
coefficient  of  0.910  (significant  at  tire  .0  j)  level  of  piooability  for  13  oh) 
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was  calculated  for  the  relationship  between  the  two  sets  o£  mean  ranks, 
showing  that  the  subjects  showed  a  high  communal ity  of  criteria  between 
List  1  and  List  2  for  rating  the  similarity  of  the  low-meaningful  stimulus 
pairs.  The  pairs  of  stimuli  ranked  highest  are  those  with  identical  but 
transposed  letter  elements  and  the  pairs  differing  only  in  their  middle 
letters.  The  lowest-ranking  pairs  in  each  set  are  those  where  no  elements 
are  shared  in  the  members  of  the  pair. 

Runquist  and  Joinson  (in  press)  have  scaled  for  similarity  ex¬ 
amples  of  all  possible  combinations  of  common  elements  among  pairs  of  CCCs. 
The  stimuli  used  in  the  present  study  represent  9  of  the  possible  categories 
of  common-element  similarity  for  which  Runquist  and  Joinson  provide  scale 
values;  a  rank  correlation  coefficient  of  0.984  (p<,01)  was  found  bctv:een 
the  similarity  values  of  the  two  studies.  Thus  the  similarity  of  CCCs  would 
appear  to  be  largely  predictable  from  the  specification  of  the  number  and 
position  of  common  letters  shared  by  a  pair  of  stimuli. 

To  summarize  the  findings  in  this  section,  evidence  has  been 
presented  to  show  that  subjects  are  able  to  rate  the  pair-wise  similarity 
of  verbal  stimuli  with  reasonable  internal  consistency,  although  a  signifi¬ 
cant  decrease  in  consistency  was  found  when  subjects  were  asked  to  rank  a 
set  of  stimuli  whose  meaningfulness  differed  from  a  homogeneous  series  that 
had  been  rated  previously.  Significant  inter-set  differences  were  found 
for  internal  consistency  of  response,  although  the  differences  seemed  to 
be  specific  to  a  given  set  of  stimuli  rather  than  being  governed  by  level 
oi  meaningful lness ,  Also,  large  individual  ditfei cnees  were  found  in  internal 
consistency  of  ranking. 

Next,  test-retest  reliability  coefficients  calculated  lor  the 
control  material  indicated  that  the  reliability  of  the  test  was  respectable, 
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TABLE  XI 

MEAN  RANKING  ASSIGNED  TO  STIMULUS  PAIRS  FROM 
LOW  MEANINGFUL  MATERIALS  OF  LIST  1  AND  LIST  2 


Pair 

Mean  Rank 

Pair 

Mean  Rank 

ZXJ  -  ZMJ 

2.00 

GQK  -  GXK 

2.59 

ZXJ  -  MZJ 

4.43 

GQK  -  XGK 

5.56 

ZXJ  -  DJZ 

6.43 

GQK  -  ZKG 

7.10 

ZXJ  -  WXQ 

7.53 

GQK  -  WGP 

8.75 

ZXJ  -  CGP 

13.57 

GQK  -  MHB 

11.37 

ZMJ  -  MZJ 

2.42 

GXK  -  XGK 

2.15 

ZMJ  ~  DJZ 

6.54 

GXK  -  ZKG 

4.82 

ZMJ  -  WZQ 

7.32 

GXK  -  WGP 

9.59 

ZMJ  -  CGP 

13.09 

GXK  -  MHB 

12.70 

MZJ  -  DJZ 

5.15 

XGK  -  ZKG 

3.45 

MZJ  -  WZQ 

7.07 

XGK  -  WGP 

6.95 

MZJ  -  CGP 

12.17 

XGK  -  MHB 

12.23 

DJZ  -  WZQ 

9.46 

ZKG  _  WGP 

8.90 

DJZ  -  CGP 

9.71 

ZKG  -  MHB 

12.89 

WZQ  -  CGP 

13.17 

WGP  -  MHB 

10.89 

although  the  reliability  for  the  hi 

gh-meaningf ul  material  was 

significantly 

smaller  than  for  the 

low-meaningful 

material.  The  hypothesis 

that  a  sub- 

ject's  ranking  of  the 

inters  tiinulus 

distances  could  be  changed 

by  having 

him  learn  responses  to  the  stimuli 

through  a  paired-associate 

procedure  was 

not  supported.  Finally,  it  was  shown  that  although  significant  communality 


is  shown  in  subjects'  ratings  of  all  sets  of  stimuli,  the  ranking  of  the  low- 
meaningful  material  seemed  extremely  stereotyped*  while  appreciable  individual 
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differences  were  found  in  the  ranking  of  the  high-meaningful  stimuli.  The 
similarity  of  the  low-meaningf ul  stimuli  seems  to  be  predictable  in  terms 
of  common  elements. 

Validity  of  Similarity  Rankings 

Having  demonstrated  that  the  reliability  of  the  similarity 
ranking  procedure  is  sufficient  to  v/arrant  further  study  of  the  data,  the 
next  step  is  to  examine  the  validity  of  the  instrument.  The  content  val¬ 
idity  would  seem  obvious  upon  an  examination  of  the  instructions  to  the 
subjects  and  of  the  structure  of  the  test,  while  determining  predictive 
or  concurrent  validity  is  difficult  due  to  the  problem  of  specifying  cri¬ 
teria  for  the  similarity  of  the  stimuli  (except,  perhaps,  in  the  case  of 
the  low-meaningful  material).  The  validation  of  the  procedure,  therefore, 
will  concentrate  on  examining  the  construct  validity  by  determining  how 
the  ranked  interstimulus  similarities  relate  to  interference  processes  in 
paired-associate  learning. 

A  decision  had  to  be  made  as  to  whether  all  the  available  data 
should  be  used  in  an  effort  to  stabilize  estimates  of  experimental  para¬ 


mo. 


ters  with  large  sample  sizes,  or  whether  data  selected  for  maximum  relia¬ 


bility  should  be  used.  The  latter  alternative  was  chosen.  It  will  be 
recalled  that  each  subject  did  three  ratings  of  experimental  stimuli  (ones 
which  were  used  as  paired-associate  stimuli  for  that  subject).  One  was  done 
before  the  learning  task,  the  second  was  a  repetition  of  the  first  after 
the  learning  task,  and  the  third  was  a  rating  of  tne  remaining  set  of  expert* 
mental  stimuli  that  had  not  already  been  xated.  It  will  also  be  remei.ioered 
that  the  stimuli  which  were  rated  only  once,  after  tne  learning  task, 

p (  nif icantly  lower  internal  consistency  of  response  than  a!  1  otnei 
ratings.  For  this  reason,  it.  was  decided,  not  tc  use  these  data  in  tne  val— 
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idation  procedure.  It  was  also  found  that  the  experimental  material  had 
shown  some  evidence  of  a  decrease  in  reliability,  compared  to  the  control 
material.  Although  this  difference  was  not  large  enough  to  be  considered 
as  adequate  to  reject  the  hypothesis  that  no  difference  exists  in  the 
population  represented  by  the  experimental  samples,  it  is  felt  that  this 
difference  was  sufficiently  large  to  warrant  these  data's  exclusion  from 
consideration  as  a  stable  basis  for  further  investigation.  Accordingly, 
only  the  ratings  of  the.  experimental  material  done  previous  to  the  learning 
task  were  included  in  the  tests  which  follow. 

Relation  Between  Similarity  Rankings  and  Confidence  Ratings. 

Two  measures  of  stimulus  generalization  during  learning  were  determined. 

The  first  was  the  confidence  ratings  given  to  the  mismatched  pairs  on  the 
recognition  test.  As  was  described  earlier,  each  subject  was  presented  with 
16  incorrect  pairings  of  stimuli  and  responses  and  required  to  give  his 
extimate  of  the  probability  that  the  pairing  was  incorrect.  Each  stimulus- 
*  response  mispairing  can  be  considered  to  correspond  to  a  particular  stimulus 
pairing,  this  stimulus  pairing  consisting  of  the  stimulus  that  had  been 
presented  in  the  mismatch  and  the  stimulus  associated  with  the  response  that 
had  been  presented  in  the  mismatch.  Eight  of  the  stimulus-response  mispair- 
ings  from  the  recognition  test  thus  correspond  to  stimulus  pairs  that  had 
been  ranked  for  similarity  before  the  learning  tasl ,  and  so  provide  an 
opportunity  to  calculate  a  rank  correlation  coefficient  expressing  the  re¬ 
lationship  between  rated  similarity  of  a  stimulus  pail  and  an  estimate  of 
the  s tren° th  of  the.  generalization  tendencies  betv.een  tv.c  stimuli. 

It  had  been  hoped  that  an  individual  correlation  coefficient 
could  be  calculated  for  each  subject.  Ilowevei ,  this  pi  oceouie  did  not 
provide  much  useful  information,  due  to  the  high  frequency  of  tied  ranks. 


« 
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I L  can  be  seen  from  Table  XII  that  the  subjects  used  only  a  few  of  the 
possible  numbers  between  0  and  100  in  their  responses  on  the  recognition 
test;  even  after  two  learning  trials  a  Luos t  half  the  ranks  were  tied.  To 
obtain  more  stable  estimates,  means  of  the  confidence  ratings  given  to 
the  first-ranked  stimulus  pairs,  the  second-ranked  pairs,  and  so  on,  were 
calculated  over  the  16  groups  of  four  subjects  each  who  had  been  tested  on 
the  same  material  after  the  same  number  of  trials  of  practice.  Rank-order 
correlation  coefficients  were  then  calculated  between  the  ranking  of  the 
stimulus  pairs  and  the  ranked  mean  confidence  ratings. 

Of  the  16  groups,  only  three  of  the  four  groups  who  had  been 
tested  after  four  trials  showed  any  significant  correlation  between  similar¬ 
ity  rankings  and  confidence  ratings.  The  8-  and  16- trial  groups  had  ob¬ 
viously  learned  most  of  the  pairings  correctly  as  they  showed  homogeneous 
and  virtually  perfect  response,  while  the  2-trial  group  who  had  probably 
learned  very  little  gave  a  wide,  randomly-distributed  spread  of  responses. 
Table  XIII  shows  the  4-trial  group's  mean  confidence  ratings  for  the  first- 
ranked  stimulus-response  pairing,  the  second-ranked  pairing,  and  so  on, 
with  the  ranking  of  the  pairs  determined  by  each  subject's  similarity  rank¬ 
ing  of  the  corresponding  stimulus  pairs.  It  also  shows  the  value  of  the 
rank  correlation  coefficient  for  each  group.  Table  XIII  shows  that  the 
correlations  are  significant  for  three  of  four  groups,  and  that  the  correl¬ 
ation  for  the  combined  data  of  all  groups  is  quite  substantial. 

] ! £ lation  between  Similarity  P.ankin.-.s  and  Overt  Response  Confusion . 
The.  second  measure  of  generalization  was  the  frequency  of  response  confusions 
shown  on  the  first  seven  recall  trials  by  8-trial  and  Jo-trial  groups,  Jnis 
particular  combination  of  number  of  trials  and  ^roups  was  used  because  it 
provided  the  largest  total  sample  of  response  confusions  where  all  subjects 


TABLE  XII 


MEAN  NUMBER  OF  CATEGORIES  USED  BY  SUBJECTS 
IN  CONFIDENCE  RATINGS 


Trials 

/. 

4 

8 

16 

Mean 

4.19 

3.00 

2.06 

1.81 

S.  I). 

0.94 

1.06 

1.25 

1  .02 

TABLE 

XIII 

MEAN 

CONFIDENCE 
FOR  RANKED 

RATINGS  AND  RANK  CORRELATION  COEFFICIENT 
STIMU LUS-RES PON S E  MISPAIR INGS  AFTER 

FOUR  LEARNING  TRIALS 

s 

Meaningfulness 

Rank 

1 

High 

2 

Low 

1 

2 

Mean 

1 

52.5 

77.0 

55.0 

32.5 

54 . 3 

2 

63.8 

82.0 

45.8 

35.0 

56.6 

3 

30.0 

89.5 

40.8 

40.0 

50.1 

4 

38.8 

93.8 

80 . 8 

72.5 

71.4 

5 

51.3 

96.3 

85.0 

70.0 

75.6 

6 

66.3 

95.0 

65.0 

95.0 

80.3 

7 

60.0 

95,0 

97.5 

70.0 

80.6 

8 

72.5 

96.3 

97.5 

80.0 

86.6 

lean  54.4  90.6  70.9  61.9  69.4 

Rho  .50  .89**  .83* *  ,81*  .93 

*  significant  at  the  .05  level  of  probability 

**  significant  at  the  .01  level  of  probability 
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had  an  equal  chance  to  respond  on  all  trials.  By  "response  confusion"  is 
meant  the  substituting  of  a  response  associated  with  another  stimulus  from 
within  a  given  set  of  pairs  for  the  correct  response  to  a  particular  stim¬ 
ulus.  If  Stimulus  A  elicited  the  response  that  was  paired  with  Stimulus  E, 
or  if  the  response  for  Stimulus  A  was  given  to  Stimulus  B,  this  error  was 
attributed  to  a  lack  of  discrimination  between  Stimulus  A  and  Stimulus  E. 
The  number  of  response  confusions  that  occurred  to  the  highest-ranked 
interstimulus  distance,  to  the  second-ranked  distance,  and  so  on  were  then 
calculated.  If  two  distances  had  been  tied  for  a  given  ranking,  then  half 
an  error  was  scored  to  each  rank. 

As  only  166  errors  of  the  type  described  above  occurred  on  the 
trials  considered,  resulting  in  many  tied  ranks  in  the  separate  analysis 
of  the  four  sets  of  stimuli,  the  data  from  the  List  1  and  List  2  subjects 
were  combined  to  provide  more  stable  estimates.  The  rankings  of  the  mean 
(N  -  8)  frequency  of  response  confusions  attributed  to  each  of  the  15 
rankings  of  interstimulus  distances  are  shown  in  Table  XIV  for  the  high 
and  low-meaningful  stimuli,  along  with  the  associated  rank  correlation 
coefficients.  It  can  be  seen  tiiat  the  correlations  for  both  sets  of  data 
are  significant. 


E ff ects  of  Similarity  Rating  Procedure  upon  Paired-Associate  Learnli 

Having  earlier  examined  the  effects  upon  similarity  ranking  per¬ 
formance  of  prior  practice  at  a  paired— associate  learning  task,  it  would 
L0  of  interest  to  study  the  converse  situation.  As  each  subject  nad  rated 
for  similarity  half  of  the  stimuli  tnot  occurred  in  the  pai red— associate 
list  before  starting  practice,  the  opportunity  is  available  to  dctonuinc 
whether  there  arc  any  differences  in  performance  witn  tnesc  two  octs  of 


stimuli. 
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As  was  described  earlier,  four  groups  of  16  subjects  each 
practiced  the  paired-associate  list  for  either  2,  4,  8  or  16  trials.  How¬ 
ever,  as  each  subject  had  a  recognition  test  after  his  last  practice,  trial 
and  a  recall  test  after  each  practice  trial  before  that,  we  actually  'nave 
recall  data  from  the  four  groups  for  1,  3,  7  and  13  trials.  Also,  if  we 
wished  to  examine  performance  after  one  practice  trial,  we  would  have 
N  =  64,  whereas  for  the  data  on  Trials  8  -  15  we  have  N  =  16.  It  is 
possible  to  examine  the  data  over  15  trials  with  the  realization  that  N 
decreased  as  trials  increased,  but  it  would  be  difficult  to  perform  a 
valid  analysis  of  variance  on  data  of  this  type.  Various  combinations  of 
groups  will  be  selected  for  analysis,  therefore,  so  that  all  subjects 
have  equal  opportunity  to  contribute  to  the  data  under  all  conditions. 

It  should  be  realized,  of  course,  that  this  decision  necessitates  that  a 
compromise  must  be  reached  on  any  given  analysis  betw7een  the  number  of 
subjects  included  and  the  number  of  trials  over  which  the  analysis  extends, 

according  to  the  hypothesis  being  considered. 

The  first  analysis  will  include  data  from  all  15  trials  from  16 
subjects.  Table  XV  show7s  the  summary  of  an  analysis  of  variance  on  the 
mean  number  of  correct  responses  to  each  set  of  stimuli  on  eacn  recall  test 
as  a  function  of  list,  meaningfulness,  trials,  and  whether  or  not  the  stim¬ 
uli  had  been  rated  for  similarity  previously  (familiarization) • 

Table  XV  shows  that  only  the  effects  of  meaningfulness,  trials, 
and  the  interaction  between  meaningfulness  and  trials  are  significant. 
Figure  4  gives  the  data  for  these  conditions  graphically.  It  can  be  seen 
that  the  high-meaningful  material  elicits  more  correct  responses  than  the 
low-meaningful  material,  performance  improves  as  a  function  of  trials,  and 
the  performance  for  both  levels  of  neaningfulness  approaches  the  assymptote 
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of  perfect  performance  near  the  end  of  practice,  all  findings  that  are 
highly  predictable  from  previous  data. 

It  can  be  seen  that  the  I -ratio  for  the  factor  of  familiarization 
is  substantial,  although  not  significant.  As  was  outlined  earlier  in  this 
paper,  it  is  expected  that  the  effects  of  familiarization  are  greatest  in 
the  early  stages  of  practice.  It  is  thus  possible  that  familiarization 
has  a  significant  effect  on  the  data  of  the  earlier  trials,  but  that  this 
effect  is  obscured  when  the  data  over  all  trials  are  summed*  This  hy¬ 
pothesis  was  tested  by  applying  the  same  analysis  of  variance  design  to 
the  data  of  Trials  1-7  only,  which  also  permits  a  more  stable  estimate 
of  the  effects  of  the  experimental  variables  by  increasing  the  number  of 
subjects  to  N  =  32. 

Table  XVI  shows  a  summary  of  this  analysis.  It  can  be  seen  that 
meaningfulness  and  trials  effects  are  still  significant.  It  can  also  be 
seen  that  familiarization  has  had  a  significant  effect.  From  Figure  5  it 
can  be  seen  that  the  familiarized  set  of  stimuli  elicit  fewer  correct  re¬ 
sponses  than  do  the  non-f amiliarized  stimuli.  It  would  seem  reasonable  to 
conclude  from  the  data  of  Trials  1-7  that  familiarization  has  had  a 
depressing  effect  on  the  association  of  responses  with  stimuli  in  the  early 
portion  of  the  learning  task.  Whether  the  absence  of  this  difference  in 
the  data  of  Trials  1  -  1 b  is  due  to  an  obscuring  of  the  effect  by  the  ad¬ 
ditional  trials  or  to  the  lesser  sensitivity  of  this  analysis  due  to  the 
smaller  sample  cannot  be  determined. 
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ngure  A.  Mean  number  of  responses  correctly  recalled  over  fifteen 
trials  as  a  function  of  meaningfulness  and  trials 
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TABLE  XV 


SUMMARY  Or  ANALYSIS  OF  VARIANCE  OF  MEAN  NUMBER  OF 
CORRECT  RESPONSES  ON  RECALL  TESTS  OVER  FIFTEEN  LEARNING 

TRIALS 


Source  df 


Between  subjects 

List  (L)  1 
Familiarization  (F)  1 
L  X  I  1 
Subjects  within  groups  12 


Within  subjects 


Meaningfulness  (M)  1 
L  X  M  1 
F  X  M  1 
L  X  F  X  M  1 
M  X  Subjects  within  groups  12 

Trials  (T)  14 
L  X  T  14 
F  X  T  14 
LX  FX  T  14 
T  X  Subjects  within  groups  168 

M  X  T  14 
L  X  M  X  T  14 
F  X  M  X  T  14 
L  X  F  X  K  X  T  14 
M  X  T  X  Subjects  wi thin 

groups  168 

Total  479 


SS 

MS 

F 

75.208 

75.208 

2.32 

93.633 

93.633 

2.89 

9.633 

9.633 

<  1 

388.383 

32.365 

— 

83.333 

83.333 

15.72 

9.633 

9.633 

1.82 

3.675 

3.675 

<1 

3.008 

3.008 

<  1 

63.617 

5.301 

— 

834.992 

59.642 

54.28 

20.792 

1.485 

1.35 

22.992 

1  .642 

1.49 

2.742 

o 

i — * 

VC 

C' 

<a 

184.617 

1.099 

- 

42.792 

3.057 

3.77 

8.242 

0.589 

<  1 

12.825 

0.916 

1.13 

13.492 

0.964 

1  .]  9 

136.383 

0.812 

**  Significant  at  the  .01  level  of  probability 
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TABLE  XVI 


SUMMARY  OF  ANALYSIS  OF  VARIANCE  OF  MEAN  NUMBER  OF  CORRECT 
RESPONSES  ON  RECALL  TESTS  OVER  SEVEN  LEARNING  TRIALS 


Source  df 


Between  subjects 

List  (L)  1 
Familiarization  (F)  1 
L  X  F  1 
Subjects  within  groups  28 


Within  subjects 


Meaningfulness  (M)  1 
L  X  M  1 
F  X  M  1 
L  X  F  X  M  1 
K  X  Subjects  within  groups  28 

Trials  (T)  6 
L  X  T  6 
F  X  T  C 
L  X  F  XT  6 
T  X  Subjects  within  groups  168 

M  XT  6 
L  X  N  X  T  6 
F  X  M  X  T  6 
L  X  F  X  11  X  T  6 
M  X  T  X  Subjects  within 

groups  168 

Total  447 


*  Significant  at  the  .05 


ss 

MS 

F 

29.009 

29.009 

2.379 

52.938 

52.938 

4.342* 

6.036 

6.036 

<  1 

341 . 411 

12.193 

- 

198.223 

198.223 

45.653* 

9.143 

9.143 

2.106 

0.893 

0.893 

<  1 

0.437 

0.437 

^  1 

121.589 

4.342 

- 

447.085 

74.514 

76.898* 

6.022 

1.004 

1.036 

11.906 

1.984 

2.047 

1  .433 

0.239 

c  1 

162.839 

0. 969 

- 

5.246 

0.874 

1.118 

9.201 

1.533 

1 .960 

6,513 

1.086 

1.389 

6.344 

1.057 

1.352 

131.411 

0.782 

— 

level  of  probability 


Significant  at  the  .01  level  of  probability 
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CHAPTER  IV 

Discussion 

In  the  following  section  the  major  findings  of  the  study  will  be  reviewed. 
This  will  be  followed  by  a  discussion  of  some  questions  arising  from  the 
results . 

Review  of  Results 

The  procedure  of  this  study  had  subjects  rank  for  similarity  60 
out  of  105  possible  pairs  of  distances  between  six  stimuli.  Through  the 
use  of  appropriate  assumptions  it  was  found  that  a  ranking  could  be  derived 
for  over  99%  of  the  total  105  pairs  of  distances  from  the  information  con¬ 
tained  in  the  60  rated  pairs.  There  was  enough  redundancy  in  the  experi¬ 
mental  data  to  provide  a  partial  check  on  one  of  these  assumptions;  that 
the  common  ordering  of  the  15  distances  that  made  up  the  105  pairs  was 
transitive.  Fewer  than  5%  of  the  observed  rankings  conflicted  with  this 
assumption.  The  majority  of  subjects  apparently  show  high  internal  con¬ 
sistency  in  their  similarity  ratings. 

The  effect  upon  subjects'  consistency  of  rating  was  studied  as 
a  function  of  several  variables.  Reliable  differences  in  consistency  of 
response  over  several  ratings  were  demonstrated  foi  individual  subjects. 

Mean  consistency  of  response  dropped  when  the  meaningfulness  of  the  material 
was  changed  after  the  subject  had  rated  several  stimulus  sets  at  a  given 
level  of  meaningfulness.  Although  the  two  sets  of  stimulus  material  had 
been  selected  so  as  to  be  as  equivalent  as  possible,  sample-specific  differ¬ 
ences  in  consistency  of  rating  were  found.  These  differences  were  not  re¬ 
lated  to  the  meaningfulness  jierscof  the  material,  however.  Nor  were  any 
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differences  found  as  a  result  of  previously  usinfa  the  stimuli  in  a  paired- 
associates  learning  tasl. 

Hie  test  -  retest  reliability  of  the  similarity  rankings  was 
found  to  be  moderately  high;  the  reliability  of  the  low-meaningful  material 
was  greater  than  that  of  the  high-meaningful  material,  No  significant 
change  in  reliability  was  found  as  a  result  of  using  the  stimuli  in  a 
paired-associates  learning  task  between  ratings. 

It  was  found  that  there  was  a  high  degree  of  agreement  between 
subjects  on  the  similarity  ratings  of  the  low-meaningful  material.  Further¬ 
more,  a  correlation  of  virtually  unity  was  found  between  the  mean  similar¬ 
ity  ratings  of  the  low-meaningful  stimuli  in  this  experiment  and  those  of 
another  similarity  rating  experiment,  even  though  different  material  and 
different  rating  procedures  were  used.  Apparently  common  standards  of 
similarity  for  low-meaningful  material  prevail  in  the  population  sampled 
in  the  present  experiment,  A  low7  level  of  concordance  of  rating  was  found 
between  subjects  for  the  high-meaningful  material,  however,  even  when  a 
correction  was  made  for  the  reliability  of  the  rating  procedure.  This 
would  seem  to  indicate  that  subjects’  standards  for  the  similarity  of  the 
meaningful  material  studied  in  this  experiment  tended  toward  the  idiosyn¬ 
cratic  . 

No  significant  relationship  was  found  between  stimulus  similarity 
and  confusion  errors  on  the  recognition  test  after  2,  b,  and  16  trials  of 
training.  After  four  training  trials,  however,  three  out  of  four  sets  of 
stimuli  (one  high-meaningful,  two  low-meaningful)  and  the.  pooled  data  of 
the  four  sets  showed  a  reliable  relationship  between  rated  similarity  and 
confusion  errors.  In  addition,  the  pooled  data  ior  toe  hign  meaningful  ana 
the  low-meaningful  stimuli  showed  a  significant  relationship  between  rated 
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similarity  and  overt  intra-list  intrusions  over  the  first  seven  trials. 

It  was  also  found  that  subjects  associated  fewer  correct  responses  on  Trials 
1  -  7  to  stimuli  that  were  rated  before  practice  on  the  learning  task  than 
to  stimuli  that  were  rated  after  learning. 

Kvaluation  of  Simila rity  Ranking  T est 

It  would  seem  that  the  similarity-rating  procedure  studied  in 
this  experiment  has  some  merit  as  a  tool  in  future  research.  The  test  was 
shown  to  have  reasonable  internal  consistency  and  test-retest  reliability 
over  short  periods  for  material  of  high  and  low  meaningfulness,  and  to  be 
able  to  predict  well  the  occurrence  of  both  intra-list  intrusions  and 


recognition  errors  in  paired-associate  learning. 

It  is  even  more  encouraging  to  note  the  potentialities  for  im¬ 
provement  that  seem  possible  in  the  test.  As  was  stated  earlier,  a  com¬ 
promise  between  quality  and  quantity  was  struck  in  the  planning  of  this 
experiment — in  order  to  determine  the  effects  of  a  number  of  variables 
without  demanding  too  much  in  terms  of  time  froai  the  subjects,  a  relatively 
short  version  of  the  similarity-rating  procedure  was  used.  A  minimum  of 
over-determination  of  the  data  was  planned,  with  60  out  of  a  possible  10b 
paired  comparisons  being  made,  each  once  only. 


If  it  were  wished,  however,  to  get  a  more  reliable  inter-stimulus 
distance  ranking  for  a  few  stimuli  in  the  same  amount  of  time  in  preference 
to  obtaining  data  on  a  larger  number  of  stimuli,  there  would  seem  to  be 
no  reason  wThy  this  could  not  be.  done.  1  he  procedure  used  in  this  study 
might  he  repeated  a  number  of  times  to  obtain  a  more  stable  estimate  of  the 
rankings.  What  would  probably  be  better  would  be  to  use  another  of  the 
cartwheel  data  collection  methods  discussed  by  Coombs  (1964)  that  provide 
more  redundant  information,  and  thus  also  more  thoroughly  check  tne  internal 
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consistency  of  the  data.  The  present  study  used  what  Coombs  terms  a  Case 
1  cartwheel  design,  where  the  basic  test  involves  all  three  possible  com¬ 
parisons  of  pairs  among  three  stimuli.  The  use  of  a  Case  3  cartwheel  design 
(where  a  given  "hub"  stimulus  is  paired  with  three  "rim"  stimuli  on  the 
basic  test  and  the  three  combinations  of  pairs  are  ranked)  in  a  ranking 
test  would  give  three  independent  rankings  of  each  pair  of  stimulus  pairs 
(each  ranking  in  a  different  context) ,  and  presumably  take  three  times  as 
long  for  the  subject  to  do. 

Another  aspect  of  the  evaluation  of  this  test  that  would  seem 
fertile  ground  for  improvement  is  the  recognition  test  procedure.  It  will 
be  recalled  that  the  similarity  ranking  data  were  validated  in  part  by 
correlation  with  the  subjects'  ratings  of  their  certainty  that  a  pair  had 
been  changed.  One  of  the  disappointments  of  this  experiment  was  the  failure 
of  the  majority  of  the  subjects  to  make  use  of  more  response  categories  on 
the  recognition  test.  As  a  result,  data  from  groups  of  four  subjects  had 
to  be  pooled  to  provide  reasonable  variation  in  this  estimate  of  general¬ 
ization  . 

One  possible  explanation  for  this  finding  is  that  the  subjects 
showed  response  bias  due  to  their  inexperience  with  making  the  type  of 
probability  judgment  required  in  this  experiment.  This  possible  response 
bias  might  be  eliminated  by  giving  subjects  practice  in  probability  estim¬ 
ation  prior  to  the  experiment.  Alternatively,  a  different  form  of  response 
might  be  used  by  subjects  to  indicate  judgments,  such  placing  a  mark  on  a 
line  (cf.  Garskof  and  Houston,  1963).  Assuming  that  a  change  in  procedure 
would  produce  greater  variation  in  judgments,  it  would  remain  to  be  deter¬ 
mined,  of  course,  whether  fineness  of  judgment  increased  as  well. 
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The  Pet  err.:  Irian  ts  of  Similarity 

Besides  the  validation,  ol  the  similarity  rating  procedure,  the 
present  study  provides  some  other  interesting  information,  especially  on 
the  determinants  of  similarity  in  material  of  high  and  low  meaningf ulness . 
Runquist  &  Joinson's  (in  press)  finding  that  rated  similarity  of  low- mean¬ 
ingful  material  is  determined  by  the  sharing  and  position  of  common  elements 
was  confirmed.  Although  this  study  did  not  use  as  large  or  representative 
a  sample  of  stimuli  as  did  Punquist  and  Joinson,  the  correlation  between 
the  mean  similarity  ratings  of  the  stimulus-types  that  did  occur  in  Loth 
studies  was  virtually  unity.  The  similarity  of  the  low-meaningful  material 
used  in  this  study  seems  to  be  determined  by  highly  stereotyped  standards. 
These  standards  seem  to  be  predictable  in  terms  of  a  common-elements 
theory  of  similarity,  where  the  elements  are  letters. 

The  same  cannot  be  said  about  the  ratings  of  the  high-meaningful 
material,  however.  This  study  has  presented  evidence  that  similarity 
rankings  of  high-meaningful  verbal  stimuli,  when  compared  to  low-meaningful 
stimuli,  show  (a)  less  inter-subject  consistency  of  response,  i.e.,  agreement 
among  subjects  about  a  stereotyped  ranking,  (b)  poorer  short-term  test-retest 
reliability  within  subjects,  (c)  no  difference  in  internal  consistency  of 
ranking.  This  last  point  is  quite  important,  because  it  apparently  elimin¬ 
ates  a  simple  explanation  of  the  first  two  effects,  that  similarity  of 
meaningful  words,  as  measured  by  the  present  technique,  is  essentially 
unstable.  The  fact  that  subjects  are  capable  of  showing  short-term  con¬ 
sistency  of  response  when  ranking  high— meaningful  stimuli  equal  to  that 
shown  while  ranking  .low-meaningful  material  would  seem  to  require  a  more 
complex  explanation.  Lhe  results  would  seem  to  support  t..e  nypothosis 
presented  earlier  in  this  paper  that  subjects  differ  in  their  criteria  in 


* 
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judging  similarity  of  meaningful  words  because  the  characteristics  which 
determine  meaningful  similarity  are  learned,  and  hence  these  criteria- vary 
between  subjects  with  different  experience.  They  also  seem  to  indicate 
that  no  more  change,  in  judgment  criteria  is  found  during,  a  given  ranking 
test  for  high-rueaningful.  material  than  for  low -meaningful  material,  but 
that  in  approximately  half  an  hour  subjects  will  change  to  a  significantly 
different  set  of  criteria,  which  show  no  change  in  internal  consistency. 
However,  although  this  interpretation  of  the  data  would  seem  to  be  the 
simplest  available,  several  points  should  be  examined  carefully. 

First,  this  interpretation  involves,  in  part,  accepting  the  null 
hypothesis,  a  dubious  procedure.  It:  means  that  we  consider  that  the  ob¬ 
served  lack  of  significant  difference  in  internal  consistency  between  the 
rankings  of  the  samples  cf  high  and  low-meaningful  material  is  representa¬ 
tive  of  the  populations  from  which  that  these  samples  were  drawn.  However, 
it  will  be  recalled  that  the  low-meaningful  material  showed  slightly,  al¬ 


though  not  significantly,  greater  internal  consistency.  Second,  it  will 
also  be  recalled  that  significant  differences  were  found  in  internal  con¬ 
sistency  within  the  four  sets  of  similarity  rankings,  although  these  seemed 
to  be  specific  to  the  sets  and  not  related  to  level  of  meaning fulness  or 
list  alone. 

It  would  seem,  then,  that  we  can  conclude  that  the  evidence 
favors  the  hypotheses  that  the  rankings  of  the  high—  and  low— meaningful 
materials  in  this  experiment  differ  in  both  betwTeen— suo j ects  consistency 
and  test— retest  consistency  over  short  intervals.  Howe vet ,  no  compelling 
evidence  could  be  found  concerning  differences  in  internal,  consistency  of 
the  ranking  of  a  single  set  of  stimuli. 

The.  conclusion  that,  hi^n  intei— suuj ect  vai  iaoility  i-»  typical  of 
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similarity  rankings  for  high-meaningful  material  might  also  be  studied 
more  carefully  as  well,  but  for  different  reasons.  There  is  no  reason  to 
believe  that  serious  sampling  error  occurred  when  the  high-meaningful 
material  was  selected.  However,  it  must  be  remembered  that  only  one 
limited  class  of  meaningful  words  -  adjectives  -  was  sampled  from.  It  is 
entirely  possible  that  verbal  units  with  a  more  concrete  denotative  func¬ 
tion  (e.g.,  concrete  nouns)  might  be  ranked  with  much  greater  inter-sub j ect 
consistency.  This  point  becomes  all  the  more  worthy  of  consideration  when 
it  is  recalled  that  only  high-frequency  adjectives  were  used.  It  would 
seem  quite  plausible  that  words  that  are  widely  used  in  everyday  experience 
might  be  so  selected  because  of  their  flexibility  and  versatility  of  sem¬ 
antic  function,  and  that  adjectives  with  a  narrower  range  of  usage  might  be 
represented  by  less  variable  points  in  semantic  space. 

Familiar ization  of  St in  u 1 i 

The  results  of  this  study  indicated  that  stimuli,  both  of  high 
and  low  meaningfulness,  which  had  been  rated  for  similarity  previous  to 
the  learning  task  had  fewer  responses  learned  to  them  than  did  the  non-rated 
stimuli  in  the  initial  stages  of  practice  on  a  paired-associate  list.  The 
results  of  this  study  support  others  that  have  presented  evidence  contrary 
to  the  hypothesis  that  experience  with  stimulus  items  prior  to  their  use 
in  a  paired-associate  learning  task  facilitates  the  learning  of  responses 

to  these  stimuli. 

There  are  a  number  of  possible  explanations  for  the  observed 
d isconf irma t i on  of  tbis  hypothesis.  Previous  experience  with  stimuli  is 
presumed  to  facilitate  learning  because  it  provides  the  subject  with  an 
opportunity  to  improve  his  discrimination  cm.  tnc  stimuli  prior  to  actual 

Two  schools  of  thought  exist  as  to  the  nature  of 


practice  with  the  list. 


. 
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this  proposed  prior  discrimination.  One  maintains  that  distinctive  re¬ 
sponses  are  learned  to  the  stimuli,  thus  increasing  their  differentiation 
by  the  creation  of  new,  more  easily-discriminated  stimulus  complexes.  The 
other  postulates  that  prior  experience  with  the  stimuli  enables  the  sub¬ 
ject  to  learn  to  attend  to  their  distinguishing  features  and  thus  eliminates 
some  of  the  confusion  between  stimuli  that  previously  existed.  According 
to  the  first  theory  any  operation  which  attaches  distinctive  responses  to 
the  stimuli  will  facilitate  subsequent  paired-associate  learning.  The 
second  theory  maintains  that  associating  distinctive  responses  to  the  stim¬ 
uli  is  not  necessary  for  subsequent  facilitation;  any  operation  that  en¬ 
courages  the  subject  to  attend  to  the  distinctive  features  of  each  stimulus 
would  achieve  this  end. 

Three  hypotheses,  related  to  the  above  theories,  might  explain 
why  inhibition,  rather  than  facilitation,  was  found  in  this  study.  The 
first  hypothesis  is  that  associations  to  the  stimuli  acquired  through 
similarity  rating  produced  more  direct  interference  with  the  responses  to 
be  learned  on  the  paired-associate  task  than  they  produced  facilitation 
through  acquired  distinctiveness.  In  other  words,  the  responses  learned 
to  the  stimuli  in  the  acquired-distinctiveness  training,  although  they  may 
have  produced  some  facilitation  by  making  the  stimuli  more  discriminable, 
produced  a  net  inhibition  effect  by  intruding  curing  paireo— associate 
practice  and  blocking  the  acquisition  of  the  collect  response.*  •  Jung  (Ijo/) 
has  suggested  that  familiarization  techniques  cause  incidental  associations 
between  the  items  being  familiarized.  these  incident. al  .issociat  Jon.j  in 
turn  cause  interference  with  the  learning  of  responses  in  the  paired- 
associates  task.  It  is  quite  possible  that  this  is  an  explanation  for  the 

inhibition  of  learning  found  in  this  experiment. 
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A  second  hypothesis  also  has  some  plausibility  when  the  procedure 
of  the  present  study  is  considered.  As  Goss  and  Nodine  (1965)  have  pointed 
out,  one  of  the  operations  that  has  been  used  to  produce  acquired  distinc¬ 
tiveness  -  simple  presentation  of  the  stimulus  on  the  assumption  that 
eliciting  a  "recognition  response"  will  cause  increased  integration  of  this 
response,  and  increase  the  stimulus’  discriminability  -  is  the  same  as  the 
operation  used  to  produce  "semantic  satiation"  (Lambert  6  Jakobovits,  1960). 
Semantic  satiation  presumably  involves  a  loss  of  meaning  for  the  stimulus, 
which  presumably  would  hinder  the  association  of  a  response  with  it. 

It  is  difficult  to  conceptualize  the  mechanism  of  semantic  satia¬ 
tion  producing  the  results  observed  in  this  experiment  when  it  is  recalled 
that  no  difference  in  inhibitory  effects  were  observed  between  the  high- 
meaningful  stimuli,  which  presumably  have  a  good  deal  of  meaning  to  lose, 
and  the  low-meaningful  stimuli,  which  by  definition  are  virtually  meaning¬ 
less.  However,  it  is  possible  that  habituation  involving  a  novelty  rather 
than  a  semantic  factor  might  be  playing  a  significant  part  here.  It  will 
be  recalled  that  half  of  the  stimuli  on  the  paired-associate  list  had  been 


familiarized  (by  similarity  ranking),  while  the  other  half  had  not.  It  is 
possible  that  the  "new"  stimuli  might  have  elicited  novelty  reactions  that 
in  some  way  facilitated  the  association  of  a  response,  while,  the  familiar¬ 


ized  stimuli  would  not  be  perceived  as  vividly  and  would  not  be  attended  to 
as  intensely  as  the  "new"  stimuli.  However,  when  it  is  remembered  that  the 
"new"  and  "old"  stimuli  were  always  of  opposite  extremes  of  meaningfulness 
for  each  subject,  it  wTould  seem  lively  that  ciiff crential  effects  of  familiar 
izaticn  would  be  observed  between  the  lot.— meaningful  slimu.l  i  ,  which  seould 
have  a  high  initial  novelty  value  and  hence  more  to  lose,  and  t.ic  nigh 
meaningful  stimuli,  whose  level  of  familiarization  would  presumably  be  near 


. 
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asymptote.  This  difference  was  not  observed  in  the  data  of  this  experi¬ 
ment.  It  may  be  possible  that  the  test  of  this  hypothesis  was  not  sensi¬ 
tive  enough  to  reveal  this  difference,  but  this  particular  explanation  of 
the  experimental  data  will  remain  uncertain  until  this  anomaly  can  be 
adequately  resolved  by  supporting  evidence. 

A  third  possible  explanation  for  the  effects  of  the  familiarization 
procedure  concerns  the  nature  of  the  similarity  ranking  task.  It  was  men¬ 
tioned  earlier  that  one  theory  postulates  that  discrimination  between  stimuli 
improves  with  experience  with  the  stimuli  because  the  subject  attends  more 
to  the  distinctive  features  of  each  stimulus  as  practice  increases,  and 
learns  to  ignore  or  not  respond  to  features  that  do  not  distinguish  differ¬ 
ent  stimuli.  This  attention  to  distinctive  features  is  obviously  beneficial 
in  paired-associate  learning  wdiere  a  different  response  must  be  associated 
to  eacli  stimulus.  But  it  is  quite  likely  that  the-  similarity  ranking  task, 


far  from  encouraging  subjects  to  search  for  and  attend  to  distinctive 
features  of  the  stimuli,  actually  encouraged  the  opposite:  searching  for 
and  attending  to  similar  features  of  the  stimuli,  while  ignoring  the  unique 
and  hence  distinctive  features.  This,  of  course,  is  what  the  instructions 
required  the  subjects  to  do,  and  it  is  possible  that  sufficient  practice  at 
searching  for  similarities  would  produce  a  form  of  acquired  equivalence  of 
stimuli  through  an  unlearning  of  mediating  responses  to  the  unique  features 
of  each  stimulus. 

This  last-mentioned  hypothesis  suggests  an  experimental  test. 
Subjects  could  be  given  a  modified  form  of  the  instructions  used  in  the 
interstimulus-distance  ranking  portion  of  this  experiment  by  changing  the 
word  "similar"  to  "different"  and  the  word  "similarity"  to  "difference". 
This  would  change  the  procedure  to  a  difference— rating  latnei  tnan  a 
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similarity-rating  one.  By  comparing  the  learning  scores  of  subjects  who 
underwent  this  modified  procedure  to  the  scores  of  control  subjects  and  of 
subjects  who  underwent  a  replication  of  the  procedure  of  the  present  study, 
data  relating  to  twro  questions  could  be  obtained: 

(1)  Do  subjects  who  participate  in  a  difference-ranking  procedure  (acquired 
distinctiveness  training)  subsequently  learn  responses  to  the  rated  stimuli 
more  quickly  than  controls,  and  do  subjects  who  attempt  a  similarity-ranking 
task  (acquired  equivalence  training)  learn  responses  more  poorly? 

(2)  Is  an  interstinmlus-dis tance  ranking  scale  obtained  through  similarity 
ratings  different  from  one  derived  from  difference  ratings? 

Reduction  of  T.nt ra-list  Interference 

It  has  been  demonstrated  that  the  average  person  is  capable  of 
remembering  perfectly  a  single  pair  of  words,  or  even  several  pairs,  after 
he  has  been  presented  with  them  only  once  (Miller,  1956).  However,  when 
over  a  certain  number  of  pairs  are  presented  once  and  the  memory  of  the 
pairings  subsequently  tested  it  is  typically  found,  as  in  this  experiment, 
that  recall  is  less  than  perfect.  Presumably  attempting  to  learn  several 
things  in  close  temporal  proximity  produces  mutual  interference  among  the 
separate  items. 


Gibson  (1940)  has  proposed  that  an  important  aspect  of  this 
interference  is  caused  by  generalization  between  the  stimulus  items  in  the 
paircd-asscciatcs  list.  According  to  her  hypothesis,  before  the  subject’s 
first  attempt  to  learn  a  paired-associates  list  the  associative  bonds  be¬ 
tween  stimuli  and  responses  should  be  negligible  if  he  has  not  experienced 
the  pairings  before.  After  he  has  been  exposed  to  the  pairs  for  a  few 
trials,  the  strength  of  the  association  between  each  stimulus  and  its  correct 
response  will  increase,  resulting  in  a  growth  of  habit  strength  of  the 


. 
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correct  responses.  However,  if  the  stimuli  are  similar  to  each  other  the 
subject  will  contuse  them,  causing  him  to  associate  wrong  (or  genera lized) 
responses  to  a  given  stimulus  as  veil.  In  this  way  the  habit  strength  of 
botli  the  correct  and  the  incorrect  responses  will  increase  during  the 
initial  trials  of  learning,  resulting  in  interference  with  the  correct 
responses.  But  as  the  frequency  of  elicitation  of  the  correct  responses 
increases,  and  with  it  the  frequency  of  the  generalized  ones,  differentiation 
between  the  stimuli  will  also  increase.  This  is  because  the  correct  responses 
are  reinforced  when  they  are  produced,  increasing  their  habit  strength,  while 
the  incorrect  responses  are  not  reinforced,  resulting  in  their  eventual 
extinction.  The  net  effect,  according  to  Gibson,  will  be  that  interference 
due  to  stimulus  generalization  will  increase  in  the  early  stages  of  paired- 
associates  learning  with  the  growing  habit  strength  of  the  correct  responses. 
It  will  reach  a  peak  at  some  intermediate  stage  o£  practice  and  then  sub- 
sequent ly  decrease  as  differentiation  increases. 

Gibson  (1942)  proposed  that  frequency  of  overt  intra-list  response 
intrusions  be  used  as  an  index  of  stimulus  generalization  in  paired-associ¬ 
ates  learning.  As  the  typical  paired-associates  learning  experiment  shows 
a  rise,  then  a  fall,  in  frequency  of  occurrence  of  overt  response  intrusion 
errors  as  a  function  of  learning  trials  (Murdock,  1958),  until  recently  it 
has  been  considered  that  Gibson's  hypothesis  concerning  the  rise  and  fall 
of  stimulus  generalization  has,  in  general,  been  supported  by  data.  However, 
Murdock  (1958)  has  pointed  out  that:  the  process  of  response  production, 
correct  or  incorrect,  is  usually  imperfect  at  the  beginning  of  the  typical 
paired-associates  experiment  and  increases  with  practice.  It  is  generally 
considered  (Underwood  and  Schulz,  1960)  that  integration  or  differentiation 
of  the  response  items  could  interfere  with  the  formation  of  stimulus-response 
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bonds  in  the  early  portions  of  the  typical  paired-associates  learning 
procedure.  A  valid  test  of  Gibson's  hypothesis  should,  according  to 
Murdock  (1958),  employ  some  means  of  avoiding  the  possible  confounding 
effect  of  response  acquisition  in  the  early  stage  of  paired-associates 
learning. 


Muir  (1963)  used  a  recognition  procedure  similar  to  the  one  of 
thc  present  study  in  an  attempt  to  achieve  this  end.  After  varying  numbers 
of  practice  trials  on  a  paired-associates  list  subjects  were  presented  with 
correct  and  incorrect  pairings  and  asked  if  the  pairings  were  the  same  as 
or  different  to  the  ones  on  the  practice  list.  The  subj ects.  were  told  to 
respond  only  if  they  were  "reasonably  sure"  that  they  could  or  could  not 
detect  a  change,  and  to  respond  accordingly  if  they  were  uncertain.  It 
was  found  that  misrecognitions  of  incorrect  pairings  did  not  increase 


and  then  decrease  as  would  be  expected  from  Gibson  s  (1940)  theory,  but 
instead  were  at  a  maximum  after  the  first  practice  trial  and  decreased 
monotonically  thereafter. 

The  results  of  the  present  experiment,  however,  could  be  inter¬ 
preted  as  support  for  Gibson's  (1940)  hypothesis.  Although  many  misrecog¬ 
nitions  of  incorrect  pairings  occurred  after  the  first  recognition  test 
( i . e . ,  after  the  second  practice  trial),  no  reliable  connection  between 
rated  stimulus  similarity  and  recognition  errors  could  be  demonstrated  for 
these  data.  This  would  seem  to  indicate  that  stimulus  generalization  could 
not  be  used  as  an  explanation  for  errors  at  this  point  (Trial  2)  in  learning 
Put  a  reliable  relationship  was  shown  between  stimulus  similarity  and  rec¬ 
ognition  errors  for  the  data  of  the  second  recognition  test,  after  Trial  4. 
For  the  two  subsequent  tests  after  li.ial  8  and  Irial  16  \iitually  no  rec¬ 


ognition  errors  occurred. 


* 


100 

It  could  he  concluded,  then,  that  although  both  the  present  study 
and  Muir  (1963)  show  that  recognition  errors  for  mispairings  in  paired- 
associates  learning  decrease  monotonically  as  a  function  of  practice,  the 
data  of  this  study  seem  to  indicate  that  misrecognitions  which  can  be  re¬ 
liably  attributed  to  stimulus  generalization  are  at  a  maximum  at  some  inter¬ 
mediate  stage  of  practice.  However,  no  explanation  seems  apparent  for  the 
large  number  of  misrecognitions  that  occurred  on  Trial  2  of  the  present 
study,  and  for  the  fact  that  Muir  (1963)  could  not  reliably  demonstrate 
that  stimulus  generalization  had  been  a  factor  in  causing  recognition  errors. 
It  seems  apparent  that  variables  other  than  stimulus  generalization  have  an 
effect  on  recognition  errors  in  the  early  stages  of  paired-associates 
learning,  and  it  is  possible  that  the  effects  of  stimulus  generalization 
are  obscured  and  hence  not  detectable  after  Trial  2  of  the  present  study. 

The  nature  of  these  other  variables  is  not  apparent  from  the  data  presented 


here . 
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Appendix  A 


Sample  Stimulus  Similarity  Rating  Form 


FULL — O — SLOW 


r  SOFT 


HARD— O™  FULL 


SLOW  — O-  MILD 


WISE. 

t  \ 

MILD  -Q--HAED 


FULL 


MILD — Q — HARD 


SOFT — Q-  SLOW 
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Appendix  B 

Instructions  for  Rating  Similarity  of  Stimuli 

On  the  following  pages  are  a  number  of  sets  of  three  words.  Each 
set  will  be  arranged  in  a  triangle  like  the  example  below.  You  are  to 
decide: 

(a)  which  two  of  the  three  words  are  most  similar  to  each  other, 

(b)  which  two  of  the  three  words  arc  least  similar  to  each  other. 

Indicate  your  choice  by  putting  a  plus  sign.(+)  in  the  circle  between  the 
two  words  that  you  think  are  most  similar,  and  a  minus  sign  (-)  in  the 

i 

circle  between  the  tv/o  words  you  consider  to  be  least  similar. 

It  is  up  to  you  to  decide  what  property  of  the  words  you  will  use 
to  judge  their  similarity  by.  However,  don’t  spend  too  much  time  in  making 
a  decision —  your  first  impression  is  the  best  basis  of  judgment.  Also, 
consider  only  one  set  of  words  at  a  time  —  don't  look  back  at  or  change 
your  previous  ratings. 

In  the  example  below,  the  three  words  can  be  considered  as  three 
pairs  —  (a)  Word  1  ~  Word  2,  (b)  Word  2  -  Word  3,  and  (c)  Word  1  -  Word  3. 

If  you  think  that  Word  2  and  Word  3  are  the  most  similar  of  the  three  pairs, 
mark  a  plus  between  them  as  shown  below.  Similarly,  put  a  minus  between 
Word  1  and  Word  3  if  you  consider  them  as  the  least  similar.  By  elimination, 
of  course,  Word  1  and  Word  2  would  be  of  intermediate  similarity. 

Remember, 

(])  It  is  up  to  you  to  decide  in  what  way  the  words  are  similar. 

(2)  Work  quickly  but  carefully. 

(3)  Consider  only  one  set  of  words  at  a  time  when  making  a  rating  —  don’t 


I 


X 


108 


look  back  at  or  change  your  previous  judgments. 


Word  1 


\ 


Word  2-  Word  3 
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Appendix  C 

Instructions  for  Learning  Paired-Associate  List 

Initial  Instructions 

I  want  you  to  learn  a  list  of  v/ord-pairs  that  I'm  going  to  show 
you.  You1 re  to  try  to  learn  the  pairings  of  the  words.  I'll  present  each 
pair  of  words  for  two  seconds  on  the  screens  in  front  of  you.  During  this 
presentation  period  I  want  you  to  study  the  pairings.  Try  to  learn  as  many 
of  the  pairings  as  possible  with  the  objective  of  learning  them  all.  After 
this  study  trial  the  screen  will  be  blank  for  a  few  seconds,  and  then  I'll 
test  you  to  see  how  many  of  the  pairings  you  can  remember.  Following  this 
test,  the  pairings  will  again  be  presented  for  two  seconds  each  for  you  to 
study,  followed  by  another  test.  So  the  procedure  runs,  study-test-study- 
test,  and  so  on. 

I'm  going  to  test  your  memory  for  the  pairings  in  two  ways .  In 
the  first  type  of  test,  called  a  recall  test,  I'll,  simply  present  the  first 
word  in  a  pairing,  and  you'll  have  four  seconds  in  which  to  try  to  remember 
and  call  out  the  second  word  of  the  pair.  So  every  time  you  see  only  the 
first  word  of  a  pair  presented,  try  to  remember  the  second  word  and  call 
it  out.  Unless  I  tell  you  otherwise,  a  recall  test  will  follow  every  study 
trial,  so  be  prepared  t:o  respond  when  the  screen  goes  blank,  as  you'll  only 
have  four  seconds  for  each  response. 

In  the  second  type  of  test,  called  a  recognition  test,  I'm  going 
to  scramble  some  of  the  pairs  and  see  if  you  can  recognize  which  ones  have 
been  changed.  In  other  words,  I'm  going  to  take  the  words  in  some  of  the 
pairs  that  I've  shown  you  earlier  and  switch  the  pairings  around.  I'll 
then  show  you,  one  at  a  time,  some  of  these  new  pairings  and  some  of  the 
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old  pairings.  As  I  show  you  each  test,  pair,  I  want  you  to  give  rue  your 
estimate  of  the  percent  probability  that  the  pairing  has  been  changed.  You 
should  give  me  this  estimate  as  a  number  between  zero  and  one  hundred,  a 
large  number  indicating  that  you  think  that  it's  probable  that  the  pairing 
has  been  changed,  and  a  small  number  shewing  that  it's  probable  that  the 
pairing  has  not  been  changed.  This  procedure  is  the  same  as  one  that  a 
weather  forecaster  might  use  to  predict  the  probability  of  rain.  If  he  says 
there  is  a  90%  probability  of  rain,  he  means  that  it's  extreme] y  likely 
that  it  will  rain,  although  there’s  a  slight  chance  that  it  won't.  If  lie 
says  there  is  a  60%  probability  of  rain,  he  means  that  the  chances  of  rain¬ 
fall  are  not  as  great,  although  he  feels  that  it  is  slightly  more,  likely  to 
rain  than  not.  Similarly,  if  he  gives  a  40%  probability  he  thinks  that 
there  probably  won't  be  any  rain,  but  there  arc  four  chances  out  of  ten  that, 
there  might  he.  If  he  gives  a  10%  probability,  it  is  very  unlikely  that  it 
will  rain,  but  there  is  a  slight  chance.  So,  in  the  same  way,  when  you  see 
each  pair  on  the  recognition  test  I  want  you  to  give  me  a  number  between 
one  hundred  and  zero.  Give  a  large  number  when  you  think  that  it  is 
probable  that  the  pair  has  been  changed,  and  a  small  number  when  you  think 
that  it  hasn't  been  changed. 

I'll  always  warn  you  when  a  recognition  test  is  coming  up.  You 'll 
have  as  much  time  as  you  want  on  the  recognition  test  to  give  your  response. 

Do  you  have  any  questions? 

P. em in d e r  I nstruc t  ion s 

"This  is  a  recognition  test.  Respons  to  each  pair  with  a  number 
between  zero  and  one  hundred.  Ren. ember,  if  you  thin!  that  it's  probable 
that  the  pair  has  been  changed,  give  a  lar^e  number.  If  you  think  that  the 
probability  is  small  that  it's  been  changed,  give  a  small  number." 


